Chris K Wensel wrote:
Any comments on the probability (currently) that reads by a Task are over the network vs. being "local", as seen in your tests? That is, are 10% of block reads over the network, or 90% of reads?

Greater than 90% of map reads are typically local in a sort job, like 98-99%. But map input is not the bottleneck in sort. Shuffle and reduce output are both considerably slower. So this sort of optimization may only have a significant impact for jobs whose maps do not produce much output.

Doug

Reply via email to