[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1098:
-------------------------------------

    Status: Open  (was: Patch Available)

This is looking good, some comments:

# The old o.a.h.mapreduce.filecache.DistributedCache.releaseCache calls the new 
api with 0L as the mtime - that is guaranteed to fail! We should either fix it 
to use the right mtime, or always use 0 as the mtime (i.e. override getKey()) 
or throw an exception. Not decrementing the cache silently is bad.
# I don't get why TrackerDistributedCacheManager.localizeCache actually deletes 
the file first... we should never have localized the file. Deleting it might 
hid a bug.
# We have forever had a stupid problem where we localize /foo on HDFS to 
<jobcache>/foo/foo in the local-fs. This patch continues that by doing some 
extra work (by calling cacheFilePath(cacheStatus.localLoadPath)). Shouldn't we 
just fix it? Adding more code for continuing old bad-practices is not useful.



> Incorrect synchronization in DistributedCache causes TaskTrackers to freeze 
> up during localization of Cache for tasks.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1098
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1098.patch, MAPREDUCE-1098.patch, 
> MAPREDUCE-1098.patch, patch-1098-0.20.txt, patch-1098-1.txt, 
> patch-1098-2.txt, patch-1098-3.txt, patch-1098-ydist.txt, patch-1098.txt
>
>
> Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI, 
> Configuration, Path, FileStatus, boolean, long, Path, boolean)}} allows only 
> one {{TaskRunner}} thread in TT to localize {{DistributedCache}} across jobs. 
> Current way of synchronization is across baseDir this has to be changed to 
> lock on the same baseDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to