[Nutch Wiki] Trivial Update of "NutchTutorial" by LewisJohnMcgibbney

Apache Wiki Fri, 30 Sep 2011 12:19:51 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "NutchTutorial" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=48&rev2=49

  To fetch, we first generate a fetch list from the database:
  
  {{{
- bin/nutch generate crawldb segments
+ bin/nutch generate crawldb crawldb/segments
  }}}
  This generates a fetch list for all of the pages due to be fetched. The fetch 
list is placed in a newly created segment directory. The segment directory is 
named by the time it's created. We save the name of this segment in the shell 
variable {{{s1}}}:
  
  {{{
- s1=`ls -d segments/2* | tail -1`
+ s1=`ls -d crawldb/segments/2* | tail -1`
  echo $s1
  }}}
  Now we run the fetcher on this segment with:
@@ -172, +172 @@

  Now we generate and fetch a new segment containing the top-scoring 1000 pages:
  
  {{{
- bin/nutch generate crawldb segments -topN 1000
+ bin/nutch generate crawldb crawldb/segments -topN 1000
  s2=`ls -d segments/2* | tail -1`
  echo $s2
  
@@ -183, +183 @@

  Let's fetch one more round:
  
  {{{
- bin/nutch generate crawldb segments -topN 1000
+ bin/nutch generate crawldb crawldb/segments -topN 1000
  s3=`ls -d segments/2* | tail -1`
  echo $s3
  
@@ -197, +197 @@

  Before indexing we first invert all of the links, so that we may index 
incoming anchor text with the pages.
  
  {{{
- bin/nutch invertlinks linkdb -dir segments
+ bin/nutch invertlinks crawldb/linkdb -dir crawldb/segments
  }}}
  We are now ready to search with Apache Solr.
  
@@ -222, +222 @@

   * run the Solr Index command:
  
  {{{
- bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb 
crawl/segments/*
+ bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb crawldb/linkdb 
crawldb/segments/*
  }}}
  This will send all crawl data to Solr for indexing. For more information 
please see bin/nutch solrindex

[Nutch Wiki] Trivial Update of "NutchTutorial" by LewisJohnMcgibbney

Reply via email to