Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchTutorial" page has been changed by JoeLencioni: http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=38&rev2=39 Comment: adding crucial parse command Now the database contains both updated entries for all initial pages as well as new entries that correspond to newly discovered pages linked from the initial set. + Then we parse the entries: + + {{{ bin/nutch parse $1 }}} + Now we generate and fetch a new segment containing the top-scoring 1000 pages: {{{ @@ -154, +158 @@ bin/nutch fetch $s2 bin/nutch updatedb crawl/crawldb $s2 + bin/nutch parse $s2 }}} Let's fetch one more round: @@ -164, +169 @@ bin/nutch fetch $s3 bin/nutch updatedb crawl/crawldb $s3 + bin/nutch parse $s3 }}} By this point we've fetched a few thousand pages. Let's index them!

