Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by EarlCahill: http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu New page: Recently, and with a bit of effort, I got db1.spack up and running on nutch trunk. I decided to keep track of what I did to get db2.spack up and running, and contribute this tutorial. == Install Ubuntu == Here are some minimal steps: * got "The Hoary Hedgehog" from http://www.ubuntu.com/download/ * entered 'server' on the install screen * the rest, I thought, was a breeze * I did run 'sudo passwd', which allowed me to do stuff as root, as below Just a little plug for ubuntu. I guess I have a funny setup. I built an Athlon 3200+ machine, with on board SATA drives that I wanted to raid, and I wanted to run java. Those few things combined together took me a couple months off and on, to get it all going. Once I found ubuntu, it took about a night. The java took another day or two. Ubuntu was pretty well exactly what I was looking for: stripped down debian, that installs almost nothing by default and allows me to apt-get install about whatever I want, if the need arises. Could probably install ssh by default though. As a side note, I just spent about five minutes trying these steps on a rather old box running debian, and it didn't immediately work, though I will try again another day. == Add Nutch User == Let's add a nutch user to do our nutch stuff {{{ # adduser nutch }}} == java == I tried to get java from normal apt sources and I am guessing it is my Athlon that broke me. I broke down and got java from Sun (http://java.sun.com/j2se/1.5.0/download.jsp), the Download JDK 5.0 Update 4 link. I tried getting the 1.4.2 and it didn't work, but 1.5.0 worked. {{{ [EMAIL PROTECTED]:/opt# ./jdk-1_5_0_04-linux-amd64.bin }}} Let's put JAVA_HOME in our ~/.bash_profiles, and source said ~/.bash_profiles for root and nutch {{{ # echo 'export JAVA_HOME=/opt/jdk1.5.0_04' >> ~/.bash_profile # . ~/.bash_profile [EMAIL PROTECTED]:~$ echo 'export JAVA_HOME=/opt/jdk1.5.0_04' >> ~/.bash_profile [EMAIL PROTECTED]:~$ . ~/.bash_profile }}} == apt == I changed my /etc/apt/sources.list to include {{{ deb http://ubuntu-backports.mirrormax.net/ hoary-backports main universe multiverse restricted deb http://ubuntu-backports.mirrormax.net/ hoary-extras main universe multiverse restricted deb http://us.archive.ubuntu.com/ubuntu hoary main restricted deb-src http://us.archive.ubuntu.com/ubuntu hoary main restricted deb http://us.archive.ubuntu.com/ubuntu hoary-updates main restricted deb-src http://us.archive.ubuntu.com/ubuntu hoary-updates main restricted deb http://us.archive.ubuntu.com/ubuntu hoary universe deb-src http://us.archive.ubuntu.com/ubuntu hoary universe deb http://security.ubuntu.com/ubuntu hoary-security main restricted deb-src http://security.ubuntu.com/ubuntu hoary-security main restricted }}} With the new apt sources, let's update {{{ # apt-get update }}} And get the packages we need. {{{ # apt-get install ssh subversion ant lynx }}} ssh is just good to have, subversion is used to get nutch, ant is used to build nutch and lynx is used to test nutch. == Build Nutch Code and Index == Let's change over to the nutch user {{{ # su - nutch }}} Checkout the code {{{ [EMAIL PROTECTED]:~$ svn checkout http://svn.apache.org/repos/asf/lucene/nutch/ }}} Since this tutorial is for getting trunk to work, let's go there {{{ [EMAIL PROTECTED]:~ $ cd ~/nutch/trunk/ }}} We build with ant {{{ [EMAIL PROTECTED]:~/nutch/trunk $ ant }}} And build a war for tomcat and later searching {{{ [EMAIL PROTECTED]:~/nutch/trunk $ ant war }}} Follow the nutch tutorial (http://lucene.apache.org/nutch/tutorial.html) to build a index, or for a simple index: {{{ [EMAIL PROTECTED]:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls [EMAIL PROTECTED]:~/nutch/trunk $ perl -pi -e 's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \ conf/crawl-urlfilter.txt [EMAIL PROTECTED]:~/nutch/trunk $ bin/nutch crawl urls -dir crawl.test -depth 3 }}} See, perl can be useful :) == tomcat == Again, I tried apt without much luck, so I downloaded tomcat from Apache (http://jakarta.apache.org/site/downloads/downloads_tomcat-4.cgi). As above, I put the java stuff in /opt {{{ [EMAIL PROTECTED]:/opt# tar -xzvf jakarta-tomcat-4.1.31.tar.gz }}} Out with the old and in with the new {{{ # rm -rf /opt/jakarta-tomcat-4.1.31/webapps/ROOT* # cp ~nutch/nutch/trunk/build/nutch-0.8-dev.war \ /opt/jakarta-tomcat-4.1.31/webapps/ROOT.war }}} Let's move to where we put the index {{{ # cd ~nutch/nutch/trunk/crawl.test }}} And start tomcat from there {{{ # /opt/jakarta-tomcat-4.1.31/bin/catalina.sh start }}} == Test == Connect to tomcat and perform a search. {{{ $ lynx localhost:8080 }}} I searched for 'nutch' and all was well! (you can use <TAB> to get to the search input in lynx) Tutorial written by Earl Cahill, 2005
