[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085224#comment-13085224
 ] 

Hudson commented on MAPREDUCE-2541:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #742 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/742/])
    MAPREDUCE-2541. Fixed a race condition in IndexCache.removeMap. Contributed 
by Binglin Chang.

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346
Files : 
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java


> Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of 
> totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2541
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2541
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0, 0.23.0
>         Environment: all
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2541.patch, MAPREDUCE-2541.v2.patch
>
>
> The race condition goes like this:
> Thread1: readIndexFileToCache()  totalMemoryUsed.addAndGet(newInd.getSize())
> Thread2: removeMap() totalMemoryUsed.addAndGet(-info.getSize());
> When SpillRecord is being read from fileSystem, client kills the job, 
> info.getSize() equals 0, so in fact totalMemoryUsed is not reduced, but after 
> thread1 finished reading SpillRecord, it adds the real index size to 
> totalMemoryUsed, which makes the value of totalMemoryUsed wrong(larger).
> When this value(totalMemoryUsed) exceeds totalMemoryAllowed (this usually 
> happens when a vary large job with vary large reduce number is killed by the 
> user, probably because the user sets a wrong reduce number by mistake), and 
> actually indexCache has not cache anything, freeIndexInformation() will throw 
> exception constantly.
> A quick fix for this issue is to make removeMap() do nothing, let 
> freeIndexInformation() do this job only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to