The easiest way would be to not use anything but your reliable machines
as datanodes. Alternately, for better performance, you could run two
DFS systems, one on all machines, and one on just the reliable machines,
and back one up to the other before you shutdown the unreliable nodes
each
doug,
it doesnt matter how the code is structured, what does matter is that
the reduce phase and shuffle phase have very different timelines and
resource requirements and should not both be charged the the number of
reduce tasks permitted.
it should be possible to have lots of tasks in the