Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchHadoopTutorial" page has been changed by TejasPatil:
https://wiki.apache.org/nutch/NutchHadoopTutorial?action=diff&rev1=42&rev2=43

  
  {{{
  cd $HADOOP_HOME
- hadoop jar apache-nutch-${version}.job org.apache.nutch.crawl.Crawl urls -dir 
crawlDir -depth 3 -topN 5
+ hadoop jar apache-nutch-${version}.job org.apache.nutch.crawl.Crawl urls -dir 
crawl -depth 3 -topN 5
  }}}
  
  We are using the nutch crawl command.  The urls dir is the urls directory 
that we added to the distributed filesystem. The "-dir crawl" is the output 
directory.  This will also go to the distributed filesystem.  The depth is 3 
meaning it will only get 3 page links deep.  There are other options you can 
specify, see the command documentation for those options. 

Reply via email to