[
https://issues.apache.org/jira/browse/TEZ-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated TEZ-3103:
----------------------------
Attachment: TEZ-3103.002.patch
Oops, I attached the wrong version of the patch, as the original test case was
flaky. Attaching a new version which should have a more reliable unit test.
> Shuffle can hang when memory to memory merging enabled
> ------------------------------------------------------
>
> Key: TEZ-3103
> URL: https://issues.apache.org/jira/browse/TEZ-3103
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: TEZ-3103.001.patch, TEZ-3103.002.patch
>
>
> The shuffle process can hang when memory to memory merging is enabled. As
> the memory-to-memory merge progresses it closes out the input segments which
> in turn lowers the commitMemory associated with those segments. However when
> the merge completes it fails to increase the commitMemory accordingly for the
> resulting merged segment. This effectively "leaks" shuffle memory, and we
> can end up in a situation where there's insufficient memory to perform any
> more in-memory shuffles but commitMemory is too low to trigger a merge. All
> the fetcher threads eventually end up waiting on the merge that will never
> occur, and the shuffle hangs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)