[
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490647#comment-14490647
]
Tsuyoshi Ozawa edited comment on TEZ-145 at 4/11/15 1:08 AM:
-------------------------------------------------------------
[~bikassaha] [~gopalv] As Gopal mentioned, this feature can target 3 and 4.
This is a benchmark result of prototype of MAPREDCE-4502:
http://www.slideshare.net/ozax86/prestrata-hadoop-word-meetup/11
On MAPREDUCE-4502, I tried to run combiner after spilling tasks: it causes
performance trade off between aggregation ratio vs disk IO. So, Gopal's comment
as follows makes sense to me.
{quote}
So tuning it to have no extra spills produced bad shuffle performance, which is
what the Tez approach is not vulnerable to, since it is meant to combine
host-local data (plus skip merges via pipelining).
{quote}
If we can implement in-memory combiner or such kind of DAG support in Tez
layer, we can improve performance more. However, we need to change the
semantics of fault tolerance to support the feature since fault tolerance won't
be task-level in this case.
was (Author: ozawa):
[~bikassaha] [~gopalv] As Gopal mentioned, this feature can target 3 and 4.
This is a benchmark result of prototype of MAPREDCE-4502:
http://www.slideshare.net/ozax86/prestrata-hadoop-word-meetup/11
On MAPREDUCE-4502, I tried to run combiner after spilling tasks: it causes
performance trade off between aggregation ratio vs disk IO. So, Gopal's comment
as follows makes sense to me.
{quote}
So tuning it to have no extra spills produced bad shuffle performance, which is
what the Tez approach is not vulnerable to, since it is meant to combine
host-local data (plus skip merges via pipelining).
{quote}
If we can implement in-memory combiner or such kind of DAG support in Tez
layer, we can improve performance more. However, we need to change the
semantics of fault tolerance.
> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
> Key: TEZ-145
> URL: https://issues.apache.org/jira/browse/TEZ-145
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees,
> support of being able to run a combiner in a non-local mode would allow
> performance efficiencies to be gained by running a combiner at a rack-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)