Re: Multiple tasktrackers per node

Doug Cutting Thu, 25 May 2006 11:34:40 -0700

Bryan A. Pendleton wrote:

I would still like to see some of these site preferences be more
dynamic. For instance, I will soon be using both single CPU and dual
CPU machines, with varying amounts of RAM. I'd happily have an extra
job or 2 scheduled on the dual CPU machines, to keep them utilized and
take better advantage of the RAM (which is mostly serving as disk
cache for my current loads). But, there's no way to set a different
tasks.maximum for each node (or a concept of "class of node") at this
point.


Sure there is: a separate config file per node.

If you'd like to make this automatic, that would be great. We'd needportable Java code to detect the amount of memory and number of CPUs.Perhaps this could be done by running some shell commands, parsing theiroutput, relying on cygwin for Windows support?

Owen's recent benchmark posting showed how machines with 5x performancevariation were effectively used during map, but that slow machines stillaffect reduce performance. He's submitted a bug and will likely fix it(if past experience is any guide):


http://issues.apache.org/jira/browse/HADOOP-253

Adapting to variability of resource is still a big problem across
hadoop. Performance still drops off very rapidly in many cases if you
have a weak node - there's no speculative reduce execution, bugs in
speculative map execution, bad handling of filled-up space during DFS
writes, as well as MapOutputFile writes. In fact, anything that calls
"getLocalPath" gets uniformly spread across available drives, with no
"full" checking - filling up any one drive on the entire cluster can
cause all kinds of things to fail.

Sounds like a good list of things to work on. Want to take on solvingany of these? They won't fix themselves...


Doug

Re: Multiple tasktrackers per node

Reply via email to