[ 
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655207#action_12655207
 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I had an offline discussion with Devaraj regarding the implementation, and we 
also went over the impact this would have when clubbed with JVM reuse.

A few comments from him that I am documenting here:
- Task directories under the tasktracker system or root directory to which 
files (such as intermediate outputs) are copied after task completion should be 
in the same disk as the original user's task directories. This is to prevent 
across disk copies.
- Regarding the problem of serving log outputs which I've mentioned 
[here|#action_12653375], we discussed one approach could be to have a command 
in the executable to read the data and return to the TaskLogServlet on demand. 
This would happen reasonably rarely and does not affect any other 
functionality. Hence it seems like the performance overhead can be ignored.
- Another comment was to reduce the number of times the executable is launched. 
For e.g. *without* JVM reuse, I can setup the directories, run the task, and 
then move the outputs with a single launch of the executable. This is possible 
because all actions are per task, and there is one JVM per task. Hence the 
lifecycle of the task fits well with the setuid changes.

With JVM reuse though, the last point becomes problematic. We can easily setup 
the directories and move the output before and after the task. However, that 
needs to be done with a separate launch of the executable - three times 
actually. The performance impact this would have (and would it offset the 
advantage of JVM reuse) is something to measure and see.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them 
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to