Great stuff, Paul!

A few minor corrections.

Apache Wiki wrote:
  1. The env var NUTCH_MASTER is set to the hostname of the master machine.

This is optional. The alternative is to mount a common home directory with NFS, as many clusters do, and keep the Nutch software there.

Also, NUTCH_MASTER is an rsync path, so it should be set to something of the form host:/path/to/nutch, e.g., "foo.bar.com:/home/$USER/src/nutch".

  2. The slave nodes are defined by putting list of hostnames, one per line, in 
~/.slaves  (alternatively, use NUTCH_SLAVES to refer to a different file).

This location can be altered with the environment variable NUTCH_SLAVES.

Thanks for writing this.

Doug

Reply via email to