[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

Siddharth Seth (JIRA) Tue, 24 Mar 2015 19:40:50 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379179#comment-14379179
 ]


Siddharth Seth commented on TEZ-2214:
-------------------------------------

Was looking at the .1 patch. The latest patch addresses the sync / visibility 
issue.
Question: This same block could just as well have been placed in the 
waitForInMemoryMerge method ? Essentially, any place where it could be 
triggered after a merge completes.

> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses 
> memToDiskMerging
> ------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2214
>                 URL: https://issues.apache.org/jira/browse/TEZ-2214
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents 
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. 
> Since InMemoryMerge is already in progress, this wouldn't trigger another 
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
> imprecise)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
>  @bci=17, line=337 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
>  @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their 
> threshold and no other fetcher threads (all are in stalled state) are there 
> to release memory. This causes fetchers to wait indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

Reply via email to