Joydeep Sen Sarma wrote:
One of the controversies is whether in the presence of failures, this makes performance worse rather than better (kind of like udp vs. tcp - what's better depends on error rate). The probability of a failure per job will increase non-linearly as the number of nodes involved per job increases. So what might make sense for small clusters may not make sense for bigger ones. But it sure would be nice to have this option.
Hmm. Personally I wouldn't put a very high priority on complicated features that don't scale well.
Doug
