Doug Cutting wrote:
Owen O'Malley wrote:
allssh -h node1000-3000 bin/hadoop-daemon.sh start tasktracker
and it will use ssh in parallel to connect to every node between
node1000 and node3000. Our's is a mess, but it would be great if
someone contributed a script like that. *smile*
It would be a one-line change to bin/slaves.sh to have it filter hosts
by a regex.
Note that bin/slaves.sh can have problems with larger clusters (>~100
nodes) since a single shell has trouble handling the i/o from 100
sub-processes, and ssh connections will start timing out. That's the
point of the HADOOP_SLAVE_SLEEP parameter, to meter the rate that
sub-processes are spawned. A better solution might be too sleep if the
number of sub-processes exceeds some limit, e.g.:
while [[ `jobs | wc -l` > 10 ]]; do sleep 1 ; done
Doug
The trick there is for your script to pick the first couple of nodes and
give them half the work, they do the same thing down the tree and you
end up with the cluster booting itself at some rate that includes
log2(N) somewhere in the equations.