Doug Cutting wrote:
Owen O'Malley wrote:
allssh -h node1000-3000 bin/hadoop-daemon.sh start tasktracker

and it will use ssh in parallel to connect to every node between node1000 and node3000. Our's is a mess, but it would be great if someone contributed a script like that. *smile*

It would be a one-line change to bin/slaves.sh to have it filter hosts by a regex.

Note that bin/slaves.sh can have problems with larger clusters (>~100 nodes) since a single shell has trouble handling the i/o from 100 sub-processes, and ssh connections will start timing out. That's the point of the HADOOP_SLAVE_SLEEP parameter, to meter the rate that sub-processes are spawned. A better solution might be too sleep if the number of sub-processes exceeds some limit, e.g.:

  while [[ `jobs | wc -l` > 10 ]]; do sleep 1 ; done

Doug

The trick there is for your script to pick the first couple of nodes and give them half the work, they do the same thing down the tree and you end up with the cluster booting itself at some rate that includes log2(N) somewhere in the equations.

Reply via email to