[ 
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717253#action_12717253
 ] 

Vinod K V commented on HADOOP-4491:
-----------------------------------

Some (broad) proposals for solving this issue:

*Localization*

 (A) Move the whole localization out of the taskTracker o be done as the user.
    - Adv: Because everything is done by the user, there is no hassle of 
changing permission now and then in TT. We just need to support reading of data 
back by the TT for serving.
    - Disadv: (As Devaraj pointed out in a quick chat) Synchronizing 
localization across the different process becomes quite complicated

 (B) Separate tt-only, child-only space from shared space. TT-only and 
child-only spaces are exclusively for the TT and the child respectively. TT 
does localization in tt-only area, task-controller binary then moves directory 
structure to the child only area. The shared space is for the stuff generated 
by the child for TT and has restricted access (511 on dirs and 444 on files) 
for TT and others. Even though other users can read this area, they won't be 
able to delete/write stuff.
    - Adv: Keeps things very simple
    - DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of 
a more manageable 511/444 permissions.

 (C) Instead of separating the directory structures completely, use the same 
for both TT and the user wherever necessary.
    - Adv : Avoids replication of the directory structure
    - DisAdv: Paths closer to the mapred-local-dir are owned by TT and further 
down the paths are owned by the child. Currently, task use same 
mapred.local.dir as task-tracker. When tasks need a path for writing their 
output, the LocalDirAllocator checks write permission on root directory owned 
by tt only and would fail We will have to handle this by modifying the 
mapred-local-dir of the child.

*Intermediate output*
 - If we chose (A) or (C) for localization, we need to run the task-controller 
again to make the output accessible to the TT
 - If we chose (B) for localization, intermediate output is automatically 
available to the TT.

*Task logs*
 - If we chose (A) or (C), whenever there is a request for the logs, we need to 
run the task-controller to run to stream the logs. Logs can be moved to 
tt-accessible area once task finishes.
 - If we chose (C), task-logs can be put in shared space readable by all users, 
and so are automatically available.

Depending on these, I think that even though (B) sacrifices some of the strict 
700 restrictions to a more free 511/444, it keeps things simple. But I am open 
to other proposals too. Thoughts?

> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to