Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=50&rev2=51

  ## page was renamed from Nutch1.3WithSolrIntegration
  ## page was renamed from Running Nutch 1.3 with Solr Integration
  ## page was renamed from RunningNutchAndSolr
+ ## Lang: En
+ =RunningNutchAndSolr=
+ 
  This tutorial was originally constructed and posted by 'waycool' on the user 
lists. It has been edited slightly for integration into the Apache Nutch 
project.
  
+ Apache Nutch is an open source web crawler written in Java. By using it, we 
can find out the hyperlinks in automated manner, reduce lots of maintenance 
work, for example checking broken links, and create a copy of all the visited 
pages for future search. That’s where Apache Solr comes in. Solr is an open 
source full text search framework, with Solr we can search the visited pages 
from Nutch. Luckily, integration between Nutch and Solr is pretty 
straightforward as explained below.
- = Notes about Nutch 1.3 =
- Please note that Apache Nutch release 1.3 has Solr integration embedded, this 
greatly eases Nutch-Solr integration. Just download release 1.3 from 
[[http://www.apache.org/dyn/closer.cgi/nutch/|here]]. This also removes the 
legacy dependence upon both Apache Tomcat for running the old Nutch WebApp and 
upon Lucene for indexing
  
+ Apache Nutch release 1.3 has Solr integration embedded, this greatly eases 
Nutch-Solr integration. It also removes the legacy dependence upon both Apache 
Tomcat for running the old Nutch Web Application and upon Apache Lucene for 
indexing. Just download a 1.3 release from 
[[http://www.apache.org/dyn/closer.cgi/nutch/|here]]. NOTE: You can download 
release 1.3 in either binary or source format, both of which are covered in 
this tutorial.
+  
- == Ubuntu Note ==
- 
- If you are using more recent versions of Ubuntu Solr comes as a package 
installable through apt-get 
- 
- {{{
- sudo apt-get install solr-tomcat
- }}}
- 
- A more in-depth howto for Ubuntu Server 10.04 Lucid Lynx is available here: 
http://ubuntuforums.org/showthread.php?p=9596257
- 
- You might wish to install it that way instead of as follows. If so then you 
will find the solr config in /etc/solr/conf 
- and the web interface can be found at http://localhost:8080/solr/
- 
  == Steps ==
- The first step to get started is to download the required software 
components, namely Apache Solr and Nutch.
- 
- '''1.''' Download Solr version 1.3.0 or LucidWorks for Solr from Download page
- 
- '''2.''' Extract Solr package
+ Setup Nutch from binary distribution:
+ '''1a.''' Unzip your binary Nutch package to $HOME/nutch-1.3
+           cd $HOME/nutch-1.3/runtime/local 
+ Setup Nutch from source distribution:
+ '''1b.''' Unzip your source package to $HOME/nutch-1.3-src 
+           cd $HOME/nutch-1.3-src 
+           run “ant” command. 
+           It should generate a directory called $HOME/nutch-1.3-src/runtime. 
+           cd $HOME/nutch-1.3-src/runtime/local 
  
  '''3.''' Download Nutch version 1.0 or later (Alternatively download the the 
nightly version of Nutch that contains the required functionality)
  

Reply via email to