[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777472#action_12777472
 ] 

Hemanth Yamijala commented on MAPREDUCE-1140:
---------------------------------------------

bq. This is done, because getLocalCache increments referenceCount first and 
then localizes. Reference count should be decremented for the one just failed 
also. So, it should be added to the list before the getLocalCache call.

Umm. But (atleast theoretically), it is still possible that a call to 
getLocalCache fails before referenceCount is incremented. For e.g. makeRelative 
throws IOException; so does getLocalCacheForWrite. Hence, we still have a 
situation where we record a file as being localized (by storing it in 
localizedCacheFiles), but the reference count is not actually incremented. And 
releaseCache would have the bug this JIRA is talking about still.

One more point I am slightly uncomfortable about is the duplication of state 
because of the new list localizedCacheFiles. 

Here's an alternate proposal:

- Modify CacheFile to have a boolean saying isLocalized. By default, this is 
false. This will be set to true if distributedCacheManager.getLocalCache 
returns successfully.
- To handle the case you have mentioned above, where a failure can happen after 
referenceCount is incremented in getLocalCache, I would suggest we catch 
exceptions inside getLocalCache, and on an exception, decrement the 
referenceCount and re-throw the exception. This seems right to me - because if 
the getLocalCache doesn't complete, shouldn't we be consistent by decrementing 
the reference count ?

Would this work ?

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1140
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to