Re: Zeroconf for hadoop

Steve Loughran Tue, 27 Jan 2009 02:54:31 -0800

Doug Cutting wrote:

Owen O'Malley wrote:
allssh -h node1000-3000 bin/hadoop-daemon.sh start tasktracker
and it will use ssh in parallel to connect to every node betweennode1000 and node3000. Our's is a mess, but it would be great ifsomeone contributed a script like that. *smile*
It would be a one-line change to bin/slaves.sh to have it filter hostsby a regex.
Note that bin/slaves.sh can have problems with larger clusters (>~100nodes) since a single shell has trouble handling the i/o from 100sub-processes, and ssh connections will start timing out. That's thepoint of the HADOOP_SLAVE_SLEEP parameter, to meter the rate thatsub-processes are spawned. A better solution might be too sleep if thenumber of sub-processes exceeds some limit, e.g.:
  while [[ `jobs | wc -l` > 10 ]]; do sleep 1 ; done

Doug

The trick there is for your script to pick the first couple of nodes andgive them half the work, they do the same thing down the tree and youend up with the cluster booting itself at some rate that includeslog2(N) somewhere in the equations.

Re: Zeroconf for hadoop

Reply via email to