[ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491620#comment-14491620
 ] 

Bikas Saha commented on TEZ-145:
--------------------------------

Right, like I said in a previous comment, the transducer needs to maintain 
partition boundaries while doing its work for this to be useful.
This would need a single vertex with its vertex manager (to do the rack aware 
grouping) and a single EdgeManager that does the custom routing from grouped 
maps to their transducer. This would be a fairly asymmetric edge because of 
arbitrary groupings.
Not sure why pipelining is required for this? Essentially we are introducing 
another vertex that is doing some partial grouping. In fact, it could be done 
today in user land without Tez changes and we should be able to accomplish that 
in this jira. The completed map outputs are being aggregated transparently for 
the next stage.
Where Tez support could be needed for efficiency is to be able to short circuit 
this stage. Lets say, the vertex manager figures out that the transducer stage 
is going to be useless (given data distribution, size and latency). Then Tez 
could allow removing this stage from the DAG so that the real consumer stage 
can be started with no overhead.

> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
>                 Key: TEZ-145
>                 URL: https://issues.apache.org/jira/browse/TEZ-145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Tsuyoshi Ozawa
>         Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to