[
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717253#action_12717253
]
Vinod K V commented on HADOOP-4491:
-----------------------------------
Some (broad) proposals for solving this issue:
*Localization*
(A) Move the whole localization out of the taskTracker o be done as the user.
- Adv: Because everything is done by the user, there is no hassle of
changing permission now and then in TT. We just need to support reading of data
back by the TT for serving.
- Disadv: (As Devaraj pointed out in a quick chat) Synchronizing
localization across the different process becomes quite complicated
(B) Separate tt-only, child-only space from shared space. TT-only and
child-only spaces are exclusively for the TT and the child respectively. TT
does localization in tt-only area, task-controller binary then moves directory
structure to the child only area. The shared space is for the stuff generated
by the child for TT and has restricted access (511 on dirs and 444 on files)
for TT and others. Even though other users can read this area, they won't be
able to delete/write stuff.
- Adv: Keeps things very simple
- DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of
a more manageable 511/444 permissions.
(C) Instead of separating the directory structures completely, use the same
for both TT and the user wherever necessary.
- Adv : Avoids replication of the directory structure
- DisAdv: Paths closer to the mapred-local-dir are owned by TT and further
down the paths are owned by the child. Currently, task use same
mapred.local.dir as task-tracker. When tasks need a path for writing their
output, the LocalDirAllocator checks write permission on root directory owned
by tt only and would fail We will have to handle this by modifying the
mapred-local-dir of the child.
*Intermediate output*
- If we chose (A) or (C) for localization, we need to run the task-controller
again to make the output accessible to the TT
- If we chose (B) for localization, intermediate output is automatically
available to the TT.
*Task logs*
- If we chose (A) or (C), whenever there is a request for the logs, we need to
run the task-controller to run to stream the logs. Logs can be moved to
tt-accessible area once task finishes.
- If we chose (C), task-logs can be put in shared space readable by all users,
and so are automatically available.
Depending on these, I think that even though (B) sacrifices some of the strict
700 restrictions to a more free 511/444, it keeps things simple. But I am open
to other proposals too. Thoughts?
> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4491
> URL: https://issues.apache.org/jira/browse/HADOOP-4491
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: mapred, security
> Reporter: Arun C Murthy
> Assignee: Vinod K V
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.