[ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674476#action_12674476 ]
Hemanth Yamijala commented on HADOOP-4490: ------------------------------------------ Thanks for the review Arun. I had a discussion with Sreekanth about the changes, and we are proposing the following: - Introduce a {{TaskTracker.initializeSystemDirs}}. This will create $mapred.local.dir/taskTracker/jobCache/, $mapred.local.dir/taskTracker/archives, and $hadoop.log.dir/userlogs on all relevant disks. Currently, as per Arun's comments, we'll have this API in TaskTracker, which will be called at Tracker initialization time. If it is felt that this should be per TaskController, then we can easily move this to the TaskController API. I think this may need 777 on the $mapred.local.dir/taskTracker/jobCache/ directory currently because the files would be created both by the task and the tracker - for e.g. the task could create the output directories on a new disk which has yet not been touched by the tracker. - Introduce a {{TaskController.initializeJob}}. This will be called from {{TaskTracker.localizeJob}}, with the jobid as parameter. This will set up the access for $mapred.local.dir/taskTracker/jobCache/jobid directories on all disks which have been touched by localization. - Modify {{TaskController.launchTaskJVM}} to set up permissions for the log dir and the pid dir associated with that task. This will remove the call to {{initializeTask}} from the {{JvmManager.runChild}} API - Modify {{TaskController.initializeTask}} to set up permissions for the log dir, pid dir, and task cache dir for the task. There is no need to set up things for the job, because it's been done in {{initializeJob}} already. We will need to repeat the permission setting for the log dir and pid dir. - Modify {{DistributedCache.localizeCache}} to set up permissions for the localized files. We propose to recursively set up 755 permissions (hardcoded) for all files under the $mapred.local.dir/taskTracker/archive/ directory for now. This might repeatedly set up permissions for files that are already correctly setup. However, it will keep things simple. If there's a performance issue, it is easy to address it, by setting it only for the files being localized, and by walking up its parent paths. Please let me know if this seems a bad choice. The above changes will mean we can remove: - DistributedCacheFileAccessInfo - FileUtil.setPermissionsForPathComponents - TaskController.cleanup - The runningJobs state maintained by LinuxTaskController. Arun, does this tie in with your expectations ? > Map and Reduce tasks should run as the user who submitted the job > ----------------------------------------------------------------- > > Key: HADOOP-4490 > URL: https://issues.apache.org/jira/browse/HADOOP-4490 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Hemanth Yamijala > Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, > HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, > HADOOP-4490.patch, HADOOP-4490.patch > > > Currently the TaskTracker spawns the map/reduce tasks, resulting in them > running as the user who started the TaskTracker. > For security and accounting purposes the tasks should be run as the job-owner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.