[ http://issues.apache.org/jira/browse/HADOOP-717?page=comments#action_12449581 ] Devaraj Das commented on HADOOP-717: ------------------------------------
This is handled by Hadoop-331 (work in progress) > When there are few reducers, sorting should be done by mappers > -------------------------------------------------------------- > > Key: HADOOP-717 > URL: http://issues.apache.org/jira/browse/HADOOP-717 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: arkady borkovsky > > If I understand correctly, currently, sort happens on the reducer side. > So if few hundred mappers produce few (or many) Gig of data, and there is > just ONE reduce to consume it, copying and sorting takes forever. > It may make sense to have a special case optimization for a single reducer. > (E.g. "when there is only reducer and many mappers, sort is done by the > mappers, and reducer does only a merge") > Or to have some smarter policy that makes sure that sorting uses as many CPUs > as it makes sense. If the map step has produced data on all the nodes of > the cluster, it makes sense to use all the nodes for sorting. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
