[jira] [Commented] (TEZ-145) Support a combiner processor that can run non-local to map/reduce nodes

Bikas Saha (JIRA) Wed, 18 Mar 2015 13:41:46 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367849#comment-14367849
 ]


Bikas Saha commented on TEZ-145:
--------------------------------

I know what you are talking about but let me restate to check if we are on the 
same page. 
Combining can be at multiple levels - task, host, rack etc.
Doing these combines in theory requires maintaining partition boundaries per 
combining level. However, if tasks are maintaining partition boundaries then 
there is a task explosion (== level-arity * partition count). Hence, an 
efficient, multi-level combine operation, needs to operate on multiple 
partitions per task at each level.  Such that a reasonable number of tasks can 
be used to process a large number of partitions. This statement can be true 
even for the final reducer. Partially, that is what happens with auto-reduce 
except that the tasks lost their partition boundaries.
If the processor can find a way to process multiple partitions while keeping 
them logically separate then we could de-link physical tasks from physical 
partitioning. If that is supported by the processor, the edge manager can be 
set up to do the correct routing of N output/partition indeces to the same task.

> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
>                 Key: TEZ-145
>                 URL: https://issues.apache.org/jira/browse/TEZ-145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Tsuyoshi Ozawa
>         Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-145) Support a combiner processor that can run non-local to map/reduce nodes

Reply via email to