Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=56&rev2=57 export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home }}} - '''4.''' Extract the Nutch package tar xzf apache-nutch-1.0.tar.gz + '''3.''' Crawl your first website: + * Add your agent name in the value field of the http.agent.name property in conf/nutch-site.xml, for example: + {{{ + <property> + <name>http.agent.name</name> + <value>My Nutch Spider</value> + </property> + }}} + * mkdir -p urls + * create a file nutch under /urls with the following content: + {{{ + http://nutch.apache,org/ + }}} + or any site you want Nutch to crawl. + * Run the following command: + {{{ + bin/nutch crawl urls -dir crawl -depth 3 -topN 5 + }}} + * Now you should be able to see the following directories exist: + {{{ + crawl/crawldb + Crawl/linkdb + crawl/segments + }}} '''5.''' Configure Solr For the sake of simplicity we are going to use the example configuration of Solr as a base.

