[
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326419#comment-14326419
]
Gopal V commented on TEZ-2001:
------------------------------
bq. Are we looking to address the memory overhead in this jira, or in a follow
up ?
The eventing model overheads are going to be a follow-up definitely.
That is a stats-quo problem, which is usually engineered by tuning splits till
it is 1 spill-per-task. The added overheads are no different from spending a
few hours of careful balancing to get an unskewed run today.
This isn't making it any worse - in fact, unskewed runs are the best case
scenario which works well today.
The crux of this JIRA is generation of & the downstream failure tolerance of
the unmerged data.
And the important idea that we are worried about are downstream task's
tolerance - the upstream tasks completing successfully means "nothing has
failed yet", without giving any additional information about the near future.
Production of all spills on a machine does not guarantee availability to all
reducers - an NM crash or network partition will trigger the exact same
re-execution codepath as a map task failure during pipelined shuffle.
In case of deterministic spills (like terasort), we can keep reducer state and
pull the remainder off the new execution.
In case of non-deterministic spills (hashmap aggregations), we are forced to
drop all reducer state. This is due to the fact that to prevent massive number
of tiny files on the reducer disk, the disk spill on a reducer is always via a
merge, so trying to hold unmerged in-memory inputs would eventually occupy all
the memory with the MergeManager stuck in WAIT.
These two scenarios are identical whether we fire 1 composite event (as bikas
suggested) or individual events.
That is because these will result in un-coordinated fetches off the shuffle
handler, due to the independence of the checksum streams of IFiles of different
spills.
Optimizations which improve over the status quo of manual tuned runs can come
after testing this out on a failure-injected cluster with a bad node always
failing the nth spill shuffles (retries, black-listing etc).
> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
> Key: TEZ-2001
> URL: https://issues.apache.org/jira/browse/TEZ-2001
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)