[ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653375#action_12653375 ]
Hemanth Yamijala commented on HADOOP-4490: ------------------------------------------ I have been able to make some progress and get a wordcount job to run as the job submitter. The design follows the basic approach mentioned above, minus the plugin abstraction, which I need to create yet. - Created a setuid C executable. - This executable currently takes the following commands: -- SETUP_DIRS <list of directories>: This command sets up task specific directories to be owned by the user. The general approach I followed for handling directory permissions is that the root directories, such as hadoop.tmp.dir/mapred/local/taskTracker/jobcache/jobid would be owned by the tasktracker daemon, which creates task directories under it when needed. Then the taskcontroller exe will change the ownership and permissions of the task directory and sub folders to the user. -- RUN_TASK <path to a file containing the M/R task to execute> The file is a temp file created under the user's work directory itself - executable by the user -- MOVE_FILES <source directory> <destination directory> This command is used to copy the intermediate output and task logs from the task directories to a system specific directory owned by the daemon. The servlets serving this data are modified to read from the system specific directory. - These are called from the JvmManager class at appropriate places. A couple of things came up when doing this: - Task logs: Currently task logs can be viewed when the task is still executing. Further the task logs are read by the TaskLogServlet, which is running in the daemon context. We want the task logs to be owned by the user. I still need to figure out how to achieve this. Currently, I am only able to access task logs after they are done, by executing the MOVE_FILES command. Any ideas are welcome. - JVM Reuse: Currently, I've only handled this with one JVM per task. Need to check the approach when JVM reuse is in the picture. - Still need to work on cleanup and kill actions, as also distributed cache. The code I have is very raw and needs lots of polishing even as a first draft. Will try to do so in a couple of days. Any comments on the approach so far ? > Map and Reduce tasks should run as the user who submitted the job > ----------------------------------------------------------------- > > Key: HADOOP-4490 > URL: https://issues.apache.org/jira/browse/HADOOP-4490 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Hemanth Yamijala > > Currently the TaskTracker spawns the map/reduce tasks, resulting in them > running as the user who started the TaskTracker. > For security and accounting purposes the tasks should be run as the job-owner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.