[ 
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649065#action_12649065
 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Before beginning discussions on approach, I wanted to summarize my 
understanding of this task, and also start discussion on a few points that I 
have some questions on.

The following are some salient points:
# We want to run tasks as the user who submitted the job, rather than as the 
user running the daemon.
# I think we also don't want to run the daemon as a privileged user (such as 
root) in order to solve this requirement. Right ?
# The directories and files used by the task should have appropriate 
permissions. Currently these directories and files are mostly created by the 
daemons, but used by the task. A few are used/accessed by the daemons also. 
Some of these directories and files are the following:
## mapred.local.dir/taskTracker/archive - directories containing distributed 
cache archives
## mapred.local.dir/taskTracker/jobcache/$jobid/ - Include work (which is a 
scratch space), jars (containing the job jars), job.xml.
## mapred.local.dir/taskTracker/jobcache/$jobid/$taskid - Include job.xml, 
output (intermediate files), work (current working dir) and temp (work/tmp) 
directories for the task.
## mapred.local.dir/taskTracker/pids/$taskid - Written by the shell launching 
the task, but read by the daemons.
# What should 'appropriate' permissions mean ? I guess read/write/execute (on 
directories) for the owner of the job is required. What should the permissions 
be for others ? If the task is the only consumer, then the permissions for 
others can be turned off. However, there are cases where the daemon / other 
processes might read the files. For instance:
## The distributed cache files can be shared across jobs.
## Jetty seems to require read permissions on the intermediate files to serve 
them to the reducers.
In the above cases, can we make these world readable ?
## Task logs are currently generated under ${hadoop.log.dir}/userlogs/$taskid. 
These are served from the TaskLogServlet of the TaskTracker.
# Apart from launching the task itself, we may need some other actions to be 
performed as the job owner. For instance:
## Killing of a task
## Maybe setting up and cleaning up of the directories / files
## Running the debug script - {{mapred.map|reduce.task.debug.script}}

Is there anything that I am missing ? Comments on the questions of shared 
directories / files - distributed cache, intermediate outputs, log files ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them 
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to