Jason Venner wrote:
When my reduce is running, on the status page I see the following for the incomplete reduce's

reduce > copy (643 of 789 at 0.12 MB/s) >

Reducers cannot copy any faster than mappers can generate output. When all maps are complete, how long does it take before copying is complete? If that delay is small, then copying is keeping up with map output.

Is that the actual transfer rate between machines, or is that a misleading number?

It's the rate that a given reduce task is able to get output. If you're running multiple reduce tasks per node, then that node's rate will be higher. As mentioned above, it's limited by the rate that maps generate output. And copying competes with map input for disk and network bandwidth.

Doug

Reply via email to