[ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655207#action_12655207 ]
Hemanth Yamijala commented on HADOOP-4490: ------------------------------------------ I had an offline discussion with Devaraj regarding the implementation, and we also went over the impact this would have when clubbed with JVM reuse. A few comments from him that I am documenting here: - Task directories under the tasktracker system or root directory to which files (such as intermediate outputs) are copied after task completion should be in the same disk as the original user's task directories. This is to prevent across disk copies. - Regarding the problem of serving log outputs which I've mentioned [here|#action_12653375], we discussed one approach could be to have a command in the executable to read the data and return to the TaskLogServlet on demand. This would happen reasonably rarely and does not affect any other functionality. Hence it seems like the performance overhead can be ignored. - Another comment was to reduce the number of times the executable is launched. For e.g. *without* JVM reuse, I can setup the directories, run the task, and then move the outputs with a single launch of the executable. This is possible because all actions are per task, and there is one JVM per task. Hence the lifecycle of the task fits well with the setuid changes. With JVM reuse though, the last point becomes problematic. We can easily setup the directories and move the output before and after the task. However, that needs to be done with a separate launch of the executable - three times actually. The performance impact this would have (and would it offset the advantage of JVM reuse) is something to measure and see. > Map and Reduce tasks should run as the user who submitted the job > ----------------------------------------------------------------- > > Key: HADOOP-4490 > URL: https://issues.apache.org/jira/browse/HADOOP-4490 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Hemanth Yamijala > > Currently the TaskTracker spawns the map/reduce tasks, resulting in them > running as the user who started the TaskTracker. > For security and accounting purposes the tasks should be run as the job-owner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.