[jira] Commented: (HADOOP-717) When there are few reducers, sorting should be done by mappers

Devaraj Das (JIRA) Mon, 13 Nov 2006 22:53:58 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-717?page=comments#action_12449581 ] 
            
Devaraj Das commented on HADOOP-717:
------------------------------------


This is handled by Hadoop-331 (work in progress)

> When there are few reducers, sorting should be done by mappers
> --------------------------------------------------------------
>
>                 Key: HADOOP-717
>                 URL: http://issues.apache.org/jira/browse/HADOOP-717
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: arkady borkovsky
>
> If I understand correctly, currently, sort happens on the reducer side.
> So if few hundred mappers produce few (or many) Gig of data, and there is 
> just ONE reduce to consume it, copying and sorting takes forever.
> It may make sense to have a special case optimization for a single reducer.  
> (E.g. "when there is only reducer and many mappers, sort is done by the 
> mappers, and reducer does only a merge")
> Or to have some smarter policy that makes sure that sorting uses as many CPUs 
> as it makes sense.   If  the map step has produced data on all the nodes of 
> the cluster, it makes sense to use all the nodes for sorting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-717) When there are few reducers, sorting should be done by mappers

Reply via email to