[Nutch Wiki] Update of "GettingNutchRunningWithUbuntu" by SteveClement

Apache Wiki Thu, 23 Dec 2010 01:22:13 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "GettingNutchRunningWithUbuntu" page has been changed by SteveClement.
The comment on this change is: added gora reference.
http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu?action=diff&rev1=6&rev2=7

--------------------------------------------------

  # su - nutch
  }}}
  
- Checkout the code
+ Checkout the code AND the gora code
  
  {{{
- nu...@db2:~$ svn checkout http://svn.apache.org/repos/asf/lucene/nutch/
+ nu...@db2:~$ svn checkout http://svn.apache.org/repos/asf/nutch/trunk nutch
+ nu...@db2:~$ cd nutch
+ nu...@db2:~$ svn checkout https://svn.apache.org/repos/asf/incubator/gora/
  }}}
  
  Since this tutorial is for getting trunk to work, let's go there
  
  {{{
- nu...@db2:~ $ cd ~/nutch/trunk/
+ nu...@db2:~ $ cd ~/nutch
  }}}
  
  We build with ant
  
  {{{
- nu...@db2:~/nutch/trunk $ ant
+ nu...@db2:~/nutch $ ant
  }}}
  
  And build a war for tomcat and later searching
@@ -96, +98 @@

  ''If you are using the latest "trunk" stuff, the url seeding has been changed 
from a single file to a directory.  Using trunk (after 0.7.2), put the urls in 
a file (here, called "nutch") in a DIRECTORY called "urls":''
  
  {{{
- nu...@db2:~/nutch/trunk $ mkdir urls
+ nu...@db2:~/nutch $ mkdir urls
- nu...@db2:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls/nutch
+ nu...@db2:~/nutch $ echo 'http://lucene.apache.org/nutch/' > urls/nutch
  }}}
  
  ''Using 0.7.2 or before, just put urls in a FILE called "urls":''
  
  {{{
- nu...@db2:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls
+ nu...@db2:~/nutch $ echo 'http://lucene.apache.org/nutch/' > urls
  }}}
  
  Then, in any case, you specify in the same fashion ("urls" below referring 
either to a dir or a file, depending on the version you're using):
  
  {{{
- nu...@db2:~/nutch/trunk $ perl -pi -e 
's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \
+ nu...@db2:~/nutch $ perl -pi -e 's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \
    conf/crawl-urlfilter.txt
- nu...@db2:~/nutch/trunk $ bin/nutch crawl urls -dir crawl.test -depth 3
+ nu...@db2:~/nutch $ src/bin/nutch crawl urls -dir crawl.test -depth 3
  }}}
  
  See, perl can be useful :)

[Nutch Wiki] Update of "GettingNutchRunningWithUbuntu" by SteveClement

Reply via email to