[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774963#action_12774963
 ] 

Hemanth Yamijala commented on MAPREDUCE-1186:
---------------------------------------------

Some history:

Before HADOOP-4490, files and archives localized as part of distributed cache 
used to be given executable permissions. I suppose an assumption was that the 
directories and files created during this localization process had read 
permissions automatically for the owner of the files. And since the owner of 
the files, basically the user tasktracker is running as, was also the owner of 
the task process, this was sufficient to access the cache files.

In HADOOP-4490, we had a situation where the tasktracker and the task could run 
as different users. The tracker localizes the files and the task needs to 
access the files. So at a minimum, read and execute permissions on directories 
and files to others needed to be granted. As mentioned in the comment linked 
above, a choice was made to recursively set these permissions on all files 
starting from the base directory - a performance problem as observed on 
clusters with a very, very large number of localized cache files.

In MAPREDUCE-856, to solve the requirement of securing access to the 
distributed cache files, the local directory structure was changed to be per 
user. Further, in the LinuxTaskController, ownership and permissions were set 
for all files under a user's archive folder to the user and providing access 
only to that user. For the DefaultTaskController, the same changes as made in 
HADOOP-4490 were retained, though it was possibly unnecessary.

First, to revisit if we need any permission setting for distributed cache files:

I think this is still required. For the DefaultTaskController, executable 
permissions need to be set on the localized files as in the pre-HADOOP-4490 
days. For the LinuxTaskController, we need to change ownership and set 
permissions in the task controller for that user.

However, in both cases, I suppose we only need to set permissions for files 
that are actually copied from DFS to the local file system (including any 
directories created in this process). This will address the issue raised in 
this JIRA.

> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>             Fix For: 0.21.0
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to