Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchHadoopTutorial" page has been changed by TejasPatil: https://wiki.apache.org/nutch/NutchHadoopTutorial?action=diff&rev1=42&rev2=43 {{{ cd $HADOOP_HOME - hadoop jar apache-nutch-${version}.job org.apache.nutch.crawl.Crawl urls -dir crawlDir -depth 3 -topN 5 + hadoop jar apache-nutch-${version}.job org.apache.nutch.crawl.Crawl urls -dir crawl -depth 3 -topN 5 }}} We are using the nutch crawl command. The urls dir is the urls directory that we added to the distributed filesystem. The "-dir crawl" is the output directory. This will also go to the distributed filesystem. The depth is 3 meaning it will only get 3 page links deep. There are other options you can specify, see the command documentation for those options.

