[Nutch Wiki] Trivial Update of "RunningNutchAndSolr" by LewisJohnMcgibbney

Apache Wiki Fri, 24 Jun 2011 13:09:51 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=56&rev2=57

  export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
  }}}
  
- '''4.''' Extract the Nutch package       tar xzf apache-nutch-1.0.tar.gz
+ '''3.''' Crawl your first website:
+  *  Add your agent name in the value field of the http.agent.name property in 
conf/nutch-site.xml, for example:
+ {{{
+ <property>
+  <name>http.agent.name</name>
+  <value>My Nutch Spider</value>
+ </property>
+ }}}
+  * mkdir -p urls
+  * create a file nutch under /urls with the following content:
+ {{{
+ http://nutch.apache,org/
+ }}}
+ or any site you want Nutch to crawl.
+  * Run the following command:
+ {{{
+ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
+ }}}
+  * Now you should be able to see the following directories exist:
+ {{{
+ crawl/crawldb 
+ Crawl/linkdb
+ crawl/segments
+ }}}
  
  '''5.''' Configure Solr For the sake of simplicity we are going to use the 
example configuration of Solr as a base.

[Nutch Wiki] Trivial Update of "RunningNutchAndSolr" by LewisJohnMcgibbney

Reply via email to