Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by NickTkach: http://wiki.apache.org/nutch/RunningNutchAndSolr ------------------------------------------------------------------------------ 1. Check out solr-trunk and nutch-trunk 1. Go into the solr-trunk and run 'ant dist dist-solrj' - 1. Get zip from [http://variogram.com/latest/SolrIndexer.zip|Variogr.am] and unzip it to solr-trunk + 1. Get zip from [http://variogram.com/latest/SolrIndexer.zip| Variogr.am] and unzip it to solr-trunk 1. Copy apache-solr-solrj-1.3-dev.jar and apache-solr-common-1.3-dev.jar to nutch-trunk/lib - 1. Get the zip file from [http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html|FooFactory] for SOLR-20 + 1. Get the zip file from [http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html| FooFactory] for SOLR-20 1. Unzip solr-client.zip somewhere, go into java/solr/src and run 'ant' 1. Copy solr-client.jar from dist to nutch-trunk/lib 1. Copy xpp3-1.1.3.4.0.jar from lib to nutch-trunk/lib - 1. Get SolrClientAdapter.java from [http://www.foofactory.fi/files/nutch-solr/nutch_solr.patch|FooFactory patch] and copy it to nutch-trunk/src/java/org/apache/nutch/indexer + 1. Get SolrClientAdapter.java from [http://www.foofactory.fi/files/nutch-solr/nutch_solr.patch| FooFactory patch] and copy it to nutch-trunk/src/java/org/apache/nutch/indexer * Edit nutch-trunk/src/java/org/apache/nutch/indexer/SolrIndexer.java: * Replace int res = new SolrIndexer().doMain(NutchConfiguration.create(), args); with int res = ToolRunner.run(NutchConfiguration.create(), new SolrIndexer(), args); 1. Edit the imports to pick up ToolRunner 1. Edit nutch-trunk/src/java/org/apache/nutch/indexer/Indexer.java changing scope on LuceneDocumentWrapper from private to protected 1. Configure nutch-trunk/conf/nutch-site.xml with settings for your site including a value for property indexer.solr.url (something like http://localhost:8983/solr/) 1. Configure some url(s) to crawl (files in a urls directory) - 1. Copy [http://www.foofactory.fi/files/nutch-solr/crawl.sh|Crawl.sh script] from FooFactory and copy it to nutch-trunk/bin (editing if needed) + 1. Copy [http://www.foofactory.fi/files/nutch-solr/crawl.sh| Crawl.sh script] from FooFactory and copy it to nutch-trunk/bin (editing if needed) 1. Start a Solr server (such as the solr-trunk/example instance) 1. Run a Nutch crawl using the bin/crawl.sh script.