Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchHadoopTutorial" page has been changed by ChiaHungLin.
http://wiki.apache.org/nutch/NutchHadoopTutorial?action=diff&rev1=26&rev2=27

--------------------------------------------------

  scp -r /nutch/search/* nutch@computer:/nutch/search
  }}}
  
+ '''The main point is to copy nutch-* (under $nutch_home/conf) and 
crawl-urlfilter.txt files to $hadoop_home/conf folder so that hadoop cluster 
can pick up those configuration when startup. Otherwise hadoop cluster will 
complain with messages e.g. "0 records selected for fetching, exiting .. URLs 
to fetch - check your seed list and URL filters."'''
+ 
  Do this for every computer you want to use as a slave node.  Then edit the 
slaves file, adding each slave node name to the file, one per line.  You will 
also want to edit the hadoop-site.xml file and change the values for the map 
and reduce task numbers, making this a multiple of the number of machines you 
have.  For our system which has 6 data nodes I put in 32 as the number of 
tasks.  The replication property can also be changed at this time.  A good 
starting value is something like 2 or 3. *(see Note at bottom about possibly 
having to clear filesystem of new datanodes).   Once this is done you should be 
able to startup all of the nodes.
  
  To start all of the nodes we use the exact same command as before:

Reply via email to