Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchTutorial" page has been changed by JoeLencioni:
http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=38&rev2=39

Comment:
adding crucial parse command

  
  Now the database contains both updated entries for all initial pages as well 
as new entries that correspond to newly discovered pages linked from the 
initial set.
  
+ Then we parse the entries:
+ 
+ {{{ bin/nutch parse $1 }}}
+ 
  Now we generate and fetch a new segment containing the top-scoring 1000 pages:
  
  {{{
@@ -154, +158 @@

  
  bin/nutch fetch $s2
  bin/nutch updatedb crawl/crawldb $s2
+ bin/nutch parse $s2
  }}}
  Let's fetch one more round:
  
@@ -164, +169 @@

  
  bin/nutch fetch $s3
  bin/nutch updatedb crawl/crawldb $s3
+ bin/nutch parse $s3
  }}}
  By this point we've fetched a few thousand pages. Let's index them!
  

Reply via email to