[ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357137#comment-14357137
 ] 

Tsuyoshi Ozawa commented on TEZ-145:
------------------------------------

[~bikassaha] [~hitesh] I prototyped CombineProcessor and 
TestOrderedWordCountWithCombineProcessor as an application. 

* CombineProcessor accepts OrderedGroupedInputLegacy and run 
Combiner#combineWithKVWriter.
* MRCombiner has a new method, combineWithKVWriter, for accepting 
KeyValueWriter instead of IFile#Writer.
* TestOrderedWordCountWithCombineProcessor uses CombineProcessor for an 
example. TestOrderedWordCountWithCombineProcessor finished in 3m40.838s with 
CombineProcessor, , while the TestOrderedWordCount finished in 6m53.447s.  
(Note that this result is unfair as you know - 
TestOrderedWordCountWithCombineProcessor works with 10 combiners thought 
TestOrderedWordCount works with only 2 reducers.)

I appreciate if you give me a feedback!

> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
>                 Key: TEZ-145
>                 URL: https://issues.apache.org/jira/browse/TEZ-145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Tsuyoshi Ozawa
>              Labels: gsoc, gsoc2015, hadoop, java, tez
>         Attachments: WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to