[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1288:
-----------------------------------

    Attachment: MR-1288-bp20-1.patch

This bug surfaced on one of the secure Yahoo clusters. This is the scenario:
1. There is a file "/a/b/c/1.txt" on the hdfs which is private (one of the 
directories in the path leading up to the hdfs file does not have EXECUTE 
permissions for OTHERS).
2. A user "foo" uses this file in his job as a DistributedCache file, and the 
TTs localizes this file in a location owned by user "foo" (since this file is 
private it lands up in the protected place).
3. A second user "bar" also tries to use the same file in his job. Both users 
belong to the same unix group.
4. Assume some TT that localized "/a/b/c/1.txt" file before, while running 
foo's task, got a task of bar's job. It concludes the file was already 
localized since the mapping has an entry for /a/b/c/1.txt (mapping refers to 
the mapping between the Cache URIs and the CacheStatus objects, maintained by 
TT). 
5. The TT doesn't localize this file again. It instead points the tasks to the 
file that was localized in step (2). Since the directory where the file was 
localized is not readable by anyone other than "foo", the tasks of "bar"'s job 
fails.

I guess earlier this issue didn't arise earlier (pre-security) since the 
distributed cache files, even if they were private, were getting localized in 
directories that were readable by all users.

Attaching a patch for Y20S that addresses the issue.

> DistributedCache localizes only once per cache URI
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1288
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distributed-cache, security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Devaraj Das
>            Priority: Critical
>         Attachments: MR-1288-bp20-1.patch
>
>
> As part of the file localization the distributed cache localizer creates a 
> copy of the file in the corresponding user's private directory. The 
> localization in DistributedCache assumes the key as the URI of the cachefile 
> and if it already exists in the map, the localization is not done again. This 
> means that another user cannot access the same distributed cache file. We 
> should change the key to include the username so that localization is done 
> for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to