Great stuff, Paul! A few minor corrections.
Apache Wiki wrote:
1. The env var NUTCH_MASTER is set to the hostname of the master machine.
This is optional. The alternative is to mount a common home directory with NFS, as many clusters do, and keep the Nutch software there.
Also, NUTCH_MASTER is an rsync path, so it should be set to something of the form host:/path/to/nutch, e.g., "foo.bar.com:/home/$USER/src/nutch".
2. The slave nodes are defined by putting list of hostnames, one per line, in ~/.slaves (alternatively, use NUTCH_SLAVES to refer to a different file).
This location can be altered with the environment variable NUTCH_SLAVES. Thanks for writing this. Doug
