[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790095#action_12790095
 ] 

Hemanth Yamijala commented on MAPREDUCE-1186:
---------------------------------------------

Amarsri,  Vinod and I discussed the trunk patch a bit. The current 
implementation attempts to work as follows:
- Before task launch, the task controller is launched to secure localized cache 
files. Previously, all files under $mapred-local-dir/$user/taskTracker/archive 
were secured. Obviously, we are trying to fix that in the context of this JIRA.
- The patch lists the directories under 
$mapred-local-dir/$user/taskTracker/archive, (which after MAPREDUCE-1098, is 
the list of random id directories that were localized).
- For each directory, if the path is not already secured, it secures it 
recursively.

This approach has a race condition that we identified:
- Say a task has localized a file and has launched the task controller to 
secure the path, and the task controller is currently under operation.
- In parallel, say another task localized another file into a different random 
id directory.
- The task controller could get the random id directory created by the second 
task when it is listing directories and set permissions for it. However, this 
directory does not contain fully localized files and hence it would be 
incompletely localized.

The key problem here is that this approach does not have a real idea of what 
files were localized by a task as part of the distributed cache. One way to fix 
that would be to pass the paths to the task controller, as a list of random id 
directories under $mapred-local-dir/$user/taskTracker/archive that were 
localized in this task. This is what I suggested in the proposal above. 
However, there are a few problems with this proposal as well:

- How do we get the list of these paths ? There's currently no way exposed by 
distributed cache about these files.
- This could be a huge list, if several tens of files are being localized in a 
task. How would we transfer all this info to the task-controller ? A huge 
command line of paths to the task controller could be unmanageable, hit some 
command line length limits, etc. Other approaches (like transferring the info 
through a file) would also be cumbersome.
- It could result in duplicate work. Say if two tasks running in parallel are 
sharing a file, both of them would get the random id directory to secure, and 
both would try and secure the path.

To solve these problems, I am proposing the following:
- Change the directory structure for localized cache files as: 
$mapred-local-dir/$user/taskTracker/archive/$task-id, where task-id is for the 
task attempt on behalf of which localization is happening. Note that a task 
could use localized files that have already been localized for another task-id. 
Since a cache entry stores the full path for a cache key, it can retrieve this 
information.
- Move securing the cache file path in the same code path as where localization 
of the cache files happens.

The last point is actually important in this new proposal, because without 
that, we might have a situation that a task could use files that have been 
localized by a prior task-id, but is not yet secured. And if we don't wait for 
that, we would have incompletely secured cache files in use.

One drawback I can think of this approach is that the new task-id directory in 
the path might give a wrong impression that the files localized under it are 
all the files used by the task in distributed cache. But clearly, files 
localized under other task-ids could be used as well.

Comments on this proposal ?

> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-1186-1.txt, patch-1186-3-ydist.txt, 
> patch-1186-3-ydist.txt, patch-1186-ydist.txt, patch-1186-ydist.txt, 
> patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to