what is mapred.reduce.parallel.copies?

Virajith Jalaparti Tue, 28 Jun 2011 11:31:53 -0700

Hi,

I have a question about the "mapred.reduce.parallel.copies" configuration
parameter in Hadoop. The mapred-default.xml file says it is "The default
number of parallel transfers run by reduce
  during the copy(shuffle) phase."
Is this the number of slave nodes from which a reduce task reads in
parallel? or is it the number of parallel intermediate outputs from map task
which a reducer task can read from?


For example, if I have 4 slave nodes and run a job with 800 maps and 4
reducers with mapred.reduce.parallel.copies=5. Then can each reduce task
read from all the 4 nodes in parallel i.e. it can makes only 4 concurrent
connections to the 4 nodes present? or can it read from 5 of the 800 map
outputs i.e. it makes at least 2 concurrent connections to a single node?

In essence, I am trying to determine how many reducers would be accessing a
single disk, concurrently, in any given Hadoop cluster for any job
configuration as a function of the various parameters that can be specified
in the configuration files.

Thanks,
Virajith

what is mapred.reduce.parallel.copies?

Reply via email to