Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=51&rev2=52 ## page was renamed from Running Nutch 1.3 with Solr Integration ## page was renamed from RunningNutchAndSolr ## Lang: En - =RunningNutchAndSolr= - This tutorial was originally constructed and posted by 'waycool' on the user lists. It has been edited slightly for integration into the Apache Nutch project. Apache Nutch is an open source web crawler written in Java. By using it, we can find out the hyperlinks in automated manner, reduce lots of maintenance work, for example checking broken links, and create a copy of all the visited pages for future search. That’s where Apache Solr comes in. Solr is an open source full text search framework, with Solr we can search the visited pages from Nutch. Luckily, integration between Nutch and Solr is pretty straightforward as explained below. @@ -13, +11 @@ Apache Nutch release 1.3 has Solr integration embedded, this greatly eases Nutch-Solr integration. It also removes the legacy dependence upon both Apache Tomcat for running the old Nutch Web Application and upon Apache Lucene for indexing. Just download a 1.3 release from [[http://www.apache.org/dyn/closer.cgi/nutch/|here]]. NOTE: You can download release 1.3 in either binary or source format, both of which are covered in this tutorial. == Steps == - Setup Nutch from binary distribution: + '''1a.''' Setup Nutch from binary distribution: - '''1a.''' Unzip your binary Nutch package to $HOME/nutch-1.3 + i. Unzip your binary Nutch package to $HOME/nutch-1.3 - cd $HOME/nutch-1.3/runtime/local + ii. cd $HOME/nutch-1.3/runtime/local - Setup Nutch from source distribution: + '''1b.''' Setup Nutch from source distribution: - '''1b.''' Unzip your source package to $HOME/nutch-1.3-src + i. Unzip your source package to $HOME/nutch-1.3-src - cd $HOME/nutch-1.3-src + ii. cd $HOME/nutch-1.3-src - run “ant” command. + iii. run “ant” command. - It should generate a directory called $HOME/nutch-1.3-src/runtime. + iv. It should generate a directory called $HOME/nutch-1.3-src/runtime. - cd $HOME/nutch-1.3-src/runtime/local + v. cd $HOME/nutch-1.3-src/runtime/local + + From now on, we am going to use ${NUTCH_RUNTIME_HOME} to refer to the current directory. '''3.''' Download Nutch version 1.0 or later (Alternatively download the the nightly version of Nutch that contains the required functionality)

