[ 
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379172#comment-14379172
 ] 

Siddharth Seth commented on TEZ-2214:
-------------------------------------

[~rajesh.balamohan] - I'm trying to understand the scenario a little better.

bq. Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
Won't the fetchers first block on merger.waitForInMemoryMerge, and then on 
merger.waitForShuffleToMergeMemory() ?
That'll happen to fetchers which aren't currently active - or for the ones 
where the MergeManager returns a WAIT.

It's possible for fetchers which already have an active list to keep going - 
and get memory as it is released by the mergeThread - or just get memory 
because some is available. Is this the situation which can cause the race ?
If the merge threshold is > 50% - won't there always be capacity available for 
a single mergeToMem (after the MemToDiskMerger completes) - which will then 
trigger another merge. The fact that we allow a single fetch to go over the 
memory limit probably complicates this - the last fetch puts the usedMemory 
over 100%. The last release from the merger doesn't bring it below 100 - will 
result in everything getting stuck.
I think the same last fetch applies to a merge threshold of < 50% as well.
Other than 'usedMemory' not going below the memoryLimit right after the 
InMemoryMerger completes, are there any other scenarios in which this will be 
triggered ?
If I'm not mistaken - for Tez 0.4, this would manifest as a tight loop on 
MergeManager.reserve returning a WAIT.


On the patch: 
Removing synchronization on waitForShuffleToMergeMemory leads to visibility 
issues for 'commitMemory'. This could be invoked by all Fetchers, and there's 
no guarantee on the threads reading the latest value. Also it's possible for 
the currently running merge to complete (thus reducing the commitMemory) 
between the time the commitMemory is checked and the next merge is triggered - 
which could result in a merge being triggered before hitting the memory limit.
Otherwise I think the approach works.
If the above case is correct - should the check be inside of usedMemory > 
memoryLimit ?

Another option would be to have the merger check if another merge is required 
when it completes. That gets messy though - and will likely get in the way of 
the MemToMemMerger in the future. A callback from the merge threads may be a 
better option - to keep the merge threads clean.

I was looking at the MapReduce code - that sets commitMemory to 0 the moment a 
merge starts. I don't think that fixes this particular race.


> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses 
> memToDiskMerging
> ------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2214
>                 URL: https://issues.apache.org/jira/browse/TEZ-2214
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents 
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. 
> Since InMemoryMerge is already in progress, this wouldn't trigger another 
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
> imprecise)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
>  @bci=17, line=337 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
>  @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their 
> threshold and no other fetcher threads (all are in stalled state) are there 
> to release memory. This causes fetchers to wait indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to