[
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated MAPREDUCE-1288:
-----------------------------------
Attachment: MR-1288-bp20-1.patch
This bug surfaced on one of the secure Yahoo clusters. This is the scenario:
1. There is a file "/a/b/c/1.txt" on the hdfs which is private (one of the
directories in the path leading up to the hdfs file does not have EXECUTE
permissions for OTHERS).
2. A user "foo" uses this file in his job as a DistributedCache file, and the
TTs localizes this file in a location owned by user "foo" (since this file is
private it lands up in the protected place).
3. A second user "bar" also tries to use the same file in his job. Both users
belong to the same unix group.
4. Assume some TT that localized "/a/b/c/1.txt" file before, while running
foo's task, got a task of bar's job. It concludes the file was already
localized since the mapping has an entry for /a/b/c/1.txt (mapping refers to
the mapping between the Cache URIs and the CacheStatus objects, maintained by
TT).
5. The TT doesn't localize this file again. It instead points the tasks to the
file that was localized in step (2). Since the directory where the file was
localized is not readable by anyone other than "foo", the tasks of "bar"'s job
fails.
I guess earlier this issue didn't arise earlier (pre-security) since the
distributed cache files, even if they were private, were getting localized in
directories that were readable by all users.
Attaching a patch for Y20S that addresses the issue.
> DistributedCache localizes only once per cache URI
> --------------------------------------------------
>
> Key: MAPREDUCE-1288
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: distributed-cache, security, tasktracker
> Affects Versions: 0.21.0
> Reporter: Devaraj Das
> Priority: Critical
> Attachments: MR-1288-bp20-1.patch
>
>
> As part of the file localization the distributed cache localizer creates a
> copy of the file in the corresponding user's private directory. The
> localization in DistributedCache assumes the key as the URI of the cachefile
> and if it already exists in the map, the localization is not done again. This
> means that another user cannot access the same distributed cache file. We
> should change the key to include the username so that localization is done
> for every user.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.