[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Hemanth Yamijala (JIRA) Thu, 04 Dec 2008 09:33:40 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653375#action_12653375
 ]


Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I have been able to make some progress and get a wordcount job to run as the 
job submitter. The design follows the basic approach mentioned above, minus the 
plugin abstraction, which I need to create yet.

- Created a setuid C executable.
- This executable currently takes the following commands:
-- SETUP_DIRS <list of directories>:
   This command sets up task specific directories to be owned by the user. The 
general approach I followed for handling directory permissions is that the root 
directories, such as hadoop.tmp.dir/mapred/local/taskTracker/jobcache/jobid 
would be owned by the tasktracker daemon, which creates task directories under 
it when needed. Then the taskcontroller exe will change the ownership and 
permissions of the task directory and sub folders to the user.
-- RUN_TASK <path to a file containing the M/R task to execute> 
   The file is a temp file created under the user's work directory itself - 
executable by the user
-- MOVE_FILES <source directory> <destination directory>
   This command is used to copy the intermediate output and task logs from the 
task directories to a system specific directory owned by the daemon. The 
servlets serving this data are modified to read from the system specific 
directory.
- These are called from the JvmManager class at appropriate places.

A couple of things came up when doing this:

- Task logs: Currently task logs can be viewed when the task is still 
executing. Further the task logs are read by the TaskLogServlet, which is 
running in the daemon context. We want the task logs to be owned by the user. I 
still need to figure out how to achieve this. Currently, I am only able to 
access task logs after they are done, by executing the MOVE_FILES command. Any 
ideas are welcome.

- JVM Reuse: Currently, I've only handled this with one JVM per task. Need to 
check the approach when JVM reuse is in the picture.

- Still need to work on cleanup and kill actions, as also distributed cache.

The code I have is very raw and needs lots of polishing even as a first draft. 
Will try to do so in a couple of days. Any comments on the approach so far ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them 
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Reply via email to