Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchTutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=48&rev2=49 To fetch, we first generate a fetch list from the database: {{{ - bin/nutch generate crawldb segments + bin/nutch generate crawldb crawldb/segments }}} This generates a fetch list for all of the pages due to be fetched. The fetch list is placed in a newly created segment directory. The segment directory is named by the time it's created. We save the name of this segment in the shell variable {{{s1}}}: {{{ - s1=`ls -d segments/2* | tail -1` + s1=`ls -d crawldb/segments/2* | tail -1` echo $s1 }}} Now we run the fetcher on this segment with: @@ -172, +172 @@ Now we generate and fetch a new segment containing the top-scoring 1000 pages: {{{ - bin/nutch generate crawldb segments -topN 1000 + bin/nutch generate crawldb crawldb/segments -topN 1000 s2=`ls -d segments/2* | tail -1` echo $s2 @@ -183, +183 @@ Let's fetch one more round: {{{ - bin/nutch generate crawldb segments -topN 1000 + bin/nutch generate crawldb crawldb/segments -topN 1000 s3=`ls -d segments/2* | tail -1` echo $s3 @@ -197, +197 @@ Before indexing we first invert all of the links, so that we may index incoming anchor text with the pages. {{{ - bin/nutch invertlinks linkdb -dir segments + bin/nutch invertlinks crawldb/linkdb -dir crawldb/segments }}} We are now ready to search with Apache Solr. @@ -222, +222 @@ * run the Solr Index command: {{{ - bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/* + bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb crawldb/linkdb crawldb/segments/* }}} This will send all crawl data to Solr for indexing. For more information please see bin/nutch solrindex

