Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "GettingNutchRunningWithUbuntu" page has been changed by SteveClement. The comment on this change is: added gora reference. http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu?action=diff&rev1=6&rev2=7 -------------------------------------------------- # su - nutch }}} - Checkout the code + Checkout the code AND the gora code {{{ - nu...@db2:~$ svn checkout http://svn.apache.org/repos/asf/lucene/nutch/ + nu...@db2:~$ svn checkout http://svn.apache.org/repos/asf/nutch/trunk nutch + nu...@db2:~$ cd nutch + nu...@db2:~$ svn checkout https://svn.apache.org/repos/asf/incubator/gora/ }}} Since this tutorial is for getting trunk to work, let's go there {{{ - nu...@db2:~ $ cd ~/nutch/trunk/ + nu...@db2:~ $ cd ~/nutch }}} We build with ant {{{ - nu...@db2:~/nutch/trunk $ ant + nu...@db2:~/nutch $ ant }}} And build a war for tomcat and later searching @@ -96, +98 @@ ''If you are using the latest "trunk" stuff, the url seeding has been changed from a single file to a directory. Using trunk (after 0.7.2), put the urls in a file (here, called "nutch") in a DIRECTORY called "urls":'' {{{ - nu...@db2:~/nutch/trunk $ mkdir urls + nu...@db2:~/nutch $ mkdir urls - nu...@db2:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls/nutch + nu...@db2:~/nutch $ echo 'http://lucene.apache.org/nutch/' > urls/nutch }}} ''Using 0.7.2 or before, just put urls in a FILE called "urls":'' {{{ - nu...@db2:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls + nu...@db2:~/nutch $ echo 'http://lucene.apache.org/nutch/' > urls }}} Then, in any case, you specify in the same fashion ("urls" below referring either to a dir or a file, depending on the version you're using): {{{ - nu...@db2:~/nutch/trunk $ perl -pi -e 's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \ + nu...@db2:~/nutch $ perl -pi -e 's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \ conf/crawl-urlfilter.txt - nu...@db2:~/nutch/trunk $ bin/nutch crawl urls -dir crawl.test -depth 3 + nu...@db2:~/nutch $ src/bin/nutch crawl urls -dir crawl.test -depth 3 }}} See, perl can be useful :)

