Re: what is mapred.reduce.parallel.copies?

Virajith Jalaparti Tue, 28 Jun 2011 15:04:36 -0700

I am using 0.20.2. So, you mean mapred.reduce.parallel.copies is thenumber of map outputs from which a reduce task can concurrently read thedata from? I understand that it is the number of concurrent threads onReduceTask. But what is the source of each of these threads? Is it asingle slave node or it is a single partition value sent over aparticular map?


Thanks
Virajith


On 6/28/2011 9:59 PM, Ted Yu wrote:

Which hadoop version are you using ?

If it is 0.20.2, mapred.reduce.parallel.copies is the number ofcopying threads in ReduceTask

In the scenario you described, at least 2 concurrent connections to asingle node would be made.


I am not familiar with newer versions of hadoop.

On Tue, Jun 28, 2011 at 11:31 AM, Virajith Jalaparti<virajit...@gmail.com <mailto:virajit...@gmail.com>> wrote:


    Hi,

    I have a question about the "mapred.reduce.parallel.copies"
    configuration parameter in Hadoop. The mapred-default.xml file
    says it is "The default number of parallel transfers run by reduce
      during the copy(shuffle) phase."
    Is this the number of slave nodes from which a reduce task reads
    in parallel? or is it the number of parallel intermediate outputs
    from map task which a reducer task can read from?

    For example, if I have 4 slave nodes and run a job with 800 maps
    and 4 reducers with mapred.reduce.parallel.copies=5. Then can each
    reduce task read from all the 4 nodes in parallel i.e. it can
    makes only 4 concurrent connections to the 4 nodes present? or can
    it read from 5 of the 800 map outputs i.e. it makes at least 2
    concurrent connections to a single node?

    In essence, I am trying to determine how many reducers would be
    accessing a single disk, concurrently, in any given Hadoop cluster
    for any job configuration as a function of the various parameters
    that can be specified in the configuration files.

    Thanks,
    Virajith

Re: what is mapred.reduce.parallel.copies?

Reply via email to