Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=56&rev2=57

  export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
  }}}
  
- '''4.''' Extract the Nutch package       tar xzf apache-nutch-1.0.tar.gz
+ '''3.''' Crawl your first website:
+  *  Add your agent name in the value field of the http.agent.name property in 
conf/nutch-site.xml, for example:
+ {{{
+ <property>
+  <name>http.agent.name</name>
+  <value>My Nutch Spider</value>
+ </property>
+ }}}
+  * mkdir -p urls
+  * create a file nutch under /urls with the following content:
+ {{{
+ http://nutch.apache,org/
+ }}}
+ or any site you want Nutch to crawl.
+  * Run the following command:
+ {{{
+ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
+ }}}
+  * Now you should be able to see the following directories exist:
+ {{{
+ crawl/crawldb 
+ Crawl/linkdb
+ crawl/segments
+ }}}
  
  '''5.''' Configure Solr For the sake of simplicity we are going to use the 
example configuration of Solr as a base.
  

Reply via email to