[
https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated MAPREDUCE-1098:
-----------------------------------------------
Attachment: patch-1098-2.txt
bq. * in getLocalCache(), refcount should be incremented only after
localizeCache(). This was the earlier behavior and we should retain it.
This can not be moved to the other lock, because if delete happens inbetween
the "marking for use" and "localizing", and if reference count is zero, delete
will go ahead and delete it. So, reference count should be increment before.
bq. * Similarly, decrement of baseDirSize should be done only after the
filesystem delete.
Incorporated in the patch
> Incorrect synchronization in DistributedCache causes TaskTrackers to freeze
> up during localization of Cache for tasks.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1098
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Reporter: Sreekanth Ramakrishnan
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: patch-1098-0.20.txt, patch-1098-1.txt, patch-1098-2.txt,
> patch-1098.txt
>
>
> Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI,
> Configuration, Path, FileStatus, boolean, long, Path, boolean)}} allows only
> one {{TaskRunner}} thread in TT to localize {{DistributedCache}} across jobs.
> Current way of synchronization is across baseDir this has to be changed to
> lock on the same baseDir.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.