[ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649065#action_12649065 ]
Hemanth Yamijala commented on HADOOP-4490: ------------------------------------------ Before beginning discussions on approach, I wanted to summarize my understanding of this task, and also start discussion on a few points that I have some questions on. The following are some salient points: # We want to run tasks as the user who submitted the job, rather than as the user running the daemon. # I think we also don't want to run the daemon as a privileged user (such as root) in order to solve this requirement. Right ? # The directories and files used by the task should have appropriate permissions. Currently these directories and files are mostly created by the daemons, but used by the task. A few are used/accessed by the daemons also. Some of these directories and files are the following: ## mapred.local.dir/taskTracker/archive - directories containing distributed cache archives ## mapred.local.dir/taskTracker/jobcache/$jobid/ - Include work (which is a scratch space), jars (containing the job jars), job.xml. ## mapred.local.dir/taskTracker/jobcache/$jobid/$taskid - Include job.xml, output (intermediate files), work (current working dir) and temp (work/tmp) directories for the task. ## mapred.local.dir/taskTracker/pids/$taskid - Written by the shell launching the task, but read by the daemons. # What should 'appropriate' permissions mean ? I guess read/write/execute (on directories) for the owner of the job is required. What should the permissions be for others ? If the task is the only consumer, then the permissions for others can be turned off. However, there are cases where the daemon / other processes might read the files. For instance: ## The distributed cache files can be shared across jobs. ## Jetty seems to require read permissions on the intermediate files to serve them to the reducers. In the above cases, can we make these world readable ? ## Task logs are currently generated under ${hadoop.log.dir}/userlogs/$taskid. These are served from the TaskLogServlet of the TaskTracker. # Apart from launching the task itself, we may need some other actions to be performed as the job owner. For instance: ## Killing of a task ## Maybe setting up and cleaning up of the directories / files ## Running the debug script - {{mapred.map|reduce.task.debug.script}} Is there anything that I am missing ? Comments on the questions of shared directories / files - distributed cache, intermediate outputs, log files ? > Map and Reduce tasks should run as the user who submitted the job > ----------------------------------------------------------------- > > Key: HADOOP-4490 > URL: https://issues.apache.org/jira/browse/HADOOP-4490 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Hemanth Yamijala > Fix For: 0.20.0 > > > Currently the TaskTracker spawns the map/reduce tasks, resulting in them > running as the user who started the TaskTracker. > For security and accounting purposes the tasks should be run as the job-owner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.