[ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101279#comment-14101279
 ] 

Gopal V commented on TEZ-145:
-----------------------------

Yes, this would be harder to implement if we retain the MR notions of 
combiners/mappers running in-proc.

I think the an intermediate step would be to pair this idea with a chunked 
shuffle, which shuffles 1 sort buffer at a time.

A combine task is then easier to model, as the map tasks themselves will never 
run any combiner tasks in that model.

Logically, this turns MR into M-R-R-R with 
M-(host-local)R-(rack-local)R-(final)R.

Once the user is producing that DAG with custom edges, the complexity reduces 
into just the scheduling/event-routing for those the edges from the topology 
information in the Edge/Vertex managers.

> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
>                 Key: TEZ-145
>                 URL: https://issues.apache.org/jira/browse/TEZ-145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Tsuyoshi OZAWA
>              Labels: TEZ-1
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to