[
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380398#comment-14380398
]
Siddharth Seth commented on TEZ-2214:
-------------------------------------
I think both - the .2 and .3 patch - are good. As long as there's no other
entity which is reserving memory. i.e. the MemToMemMerger may just become a
little more complicated, or if we ever support data via events.
A fetcher will always trigger the MemToDiskMerger - and then go and wait on
waitForInMemoryMerge, followed by waitForShuffleToMergeMemory. If the data
fetched by this Fetcher triggered a merge - it'll always wait and re-check to
see if another merge is required. If the data fetched did not trigger a merge
(and a merge wasn't in progress) - memory limits haven't been hit, and a future
fetch would trigger this.
> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses
> memToDiskMerging
> ------------------------------------------------------------------------------------------
>
> Key: TEZ-2214
> URL: https://issues.apache.org/jira/browse/TEZ-2214
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory.
> Since InMemoryMerge is already in progress, this wouldn't trigger another
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
> - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be
> imprecise)
> - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
> -
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
> @bci=17, line=337 (Interpreted frame)
> -
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
> @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their
> threshold and no other fetcher threads (all are in stalled state) are there
> to release memory. This causes fetchers to wait indefinitely.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)