Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=62&rev2=63 From now on, we am going to use ${NUTCH_RUNTIME_HOME} to refer to the current directory. - '''2.''' Verify your Nutch installation: + == 2. Verify your Nutch installation == + * run "bin/nutch" - You can confirm a correct installation if you seeing the following: {{{ Usage: nutch [-core] COMMAND @@ -43, +44 @@ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home }}} - '''3.''' Crawl your first website: + == 3. Crawl your first website == + * Add your agent name in the value field of the http.agent.name property in conf/nutch-site.xml, for example: {{{ <property> @@ -73, +75 @@ }}} If not then please read on for how to set up your Solr instance and index your crawl data. - '''4a.''' Setup Solr for search from binary distribution: + == 4a. Setup Solr for search from binary distribution == + * download binary file from [[http://www.apache.org/dyn/closer.cgi/lucene/solr/|here]] * unzip to $HOME/apache-solr-3.X, we will now refer to this as ${APACHE_SOLR_HOME} * cd ${APACHE_SOLR_HOME}/example * java -jar start.jar - '''4b.''' Setup Solr for search from source distribution: + == 4b. Setup Solr for search from source distribution == + * You can setup Solr from source distribution with Maven. This [[http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html|link]] shows how to do that. - '''5.''' Verify Solr installation: + == 5. Verify Solr installation == + After you started Solr admin console, you should be able to access the following links: {{{ http://localhost:8983/solr/admin/ http://localhost:8983/solr/admin/stats.jsp }}} - '''6.''' Integrate Solr with Nutch + == 6. Integrate Solr with Nutch == + We have both Nutch and Solr installed and setup correctly. And Nutch already created crawl data from the seed url(s). Below are the steps to delagte searching to Solr for links to be searchable: * cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/ * restart Solr with the command “java -jar start.jar” under ${APACHE_SOLR_HOME}/example

