[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403464#comment-13403464
 ] 

Kihwal Lee commented on MAPREDUCE-4384:
---------------------------------------

When {{TestIndexCache}} failed, the log contained a warning message, "Map 
IDxxxx not found in queue!!". The queue is used to figure out what to drop in 
its the FIFO cache replacement policy. This message indicates that the cache 
entry was freed by a removeMap() call, but the corresponding entry was not 
found in the queue.

This can happen if {{removeMap()}} is called while the cache entry is being 
loaded. If a new incomplete entry is added to the cache between 
{{cache.get(mapId)}} and {[cache.remove{{mapId}} in {{removeMap()}}, the new 
entry will be removed from the cache. Further, if {{totalMemoryUsed}} is 
updated before the entry is fully loaded, it will end up subtracting zero from 
the usage. When the loading is complete in {{readIndexFileToCache()}}, 
{{totalMemoryUsed}} will be incremented, but since it was already removed from 
the cache, there is no way it can be decremented. Hence the discrepancy in 
memory usage tracking.

This issue can be fixed by adding one more condition to the first check in 
{{removeMap()}}

{noformat}
   IndexInformation info = cache.get(mapId);
 - if ((info != null) && (info.getSize() == 0)) {
 + if (info == null || ((info != null) && (info.getSize() == 0))) {
     return;
    }
{noformat}


Another potential issue is in {{readIndexFileToCache()}}. When two different 
threads are trying to add the same entry to the cache, there can be a deadlock. 
When Thread A puts a new {{IndexInformation}} object in the cache,  Thread B 
can come in a bit late and do {{wait()}} on this object to be fully ready. The 
{{wait()}} is inside the {{synchronized(info)}} block and {{info}} is the new 
object it just found in the cache.  But Thread A also tries to update the same 
object and do {{notifyAll()}} inside a synchronized() block on it. This results 
in a deadlock.


                
> Race conditions in IndexCache
> -----------------------------
>
>                 Key: MAPREDUCE-4384
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Kihwal Lee
>             Fix For: 0.23.3, 2.0.1-alpha, 3.0.0
>
>
> TestIndexCache is intermittently failing due to a race condition. Up on 
> inspection of IndexCache implementation, more potential issues have been 
> discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to