[ 
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674476#action_12674476
 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Thanks for the review Arun.

I had a discussion with Sreekanth about the changes, and we are proposing the 
following:

- Introduce a {{TaskTracker.initializeSystemDirs}}. This will create 
$mapred.local.dir/taskTracker/jobCache/, 
$mapred.local.dir/taskTracker/archives, and $hadoop.log.dir/userlogs on all 
relevant disks.
Currently, as per Arun's comments, we'll have this API in TaskTracker, which 
will be called at Tracker initialization time. If it is felt that this should 
be per TaskController, then we can easily move this to the TaskController API. 
I think this may need 777 on the $mapred.local.dir/taskTracker/jobCache/ 
directory currently because the files would be created both by the task and the 
tracker - for e.g. the task could create the output directories on a new disk 
which has yet not been touched by the tracker.

- Introduce a {{TaskController.initializeJob}}. This will be called from 
{{TaskTracker.localizeJob}}, with the jobid as parameter. This will set up the 
access for $mapred.local.dir/taskTracker/jobCache/jobid directories on all 
disks which have been touched by localization.

- Modify {{TaskController.launchTaskJVM}} to set up permissions for the log dir 
and the pid dir associated with that task. This will remove the call to 
{{initializeTask}} from the {{JvmManager.runChild}} API

- Modify {{TaskController.initializeTask}} to set up permissions for the log 
dir, pid dir, and task cache dir for the task. There is no need to set up 
things for the job, because it's been done in {{initializeJob}} already. We 
will need to repeat the permission setting for the log dir and pid dir.

- Modify {{DistributedCache.localizeCache}} to set up permissions for the 
localized files. We propose to recursively set up 755 permissions (hardcoded) 
for all files under the $mapred.local.dir/taskTracker/archive/ directory for 
now. This might repeatedly set up permissions for files that are already 
correctly setup. However, it will keep things simple. If there's a performance 
issue, it is easy to address it, by setting it only for the files being 
localized, and by walking up its parent paths. Please let me know if this seems 
a bad choice.

The above changes will mean we can remove:

- DistributedCacheFileAccessInfo
- FileUtil.setPermissionsForPathComponents
- TaskController.cleanup
- The runningJobs state maintained by LinuxTaskController.

Arun, does this tie in with your expectations ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, 
> HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, 
> HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them 
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to