[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Hemanth Yamijala (JIRA) Fri, 03 Apr 2009 04:28:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695339#action_12695339
 ]


Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Some comments:

- Use getLocalJobDir in LinuxTaskController.localizeJob
- Whenever mkdir or mkdirs fails, we should continue from loops.
- Changes in TaskRunner seem unnecessary.
- Changes in DistributedCache to pass the baseDir seems unnecessary. Note that 
localizeCache already takes a CacheStatus object that has the baseDir.
- This comment is not incorporated: "Modify TaskController.launchTaskJVM to set 
up permissions for the log dir and the pid dir associated with that task. This 
will remove the call to initializeTask from the JvmManager.runChild API." I 
think this can be done by calling setup*FileAccess from launchTaskJVM
- localizeJob can be called initializeJob
- In setupTaskCacheFileAccess, we are setting permissions recursively from the 
job directory. But this is what we do in LinuxTaskController.localizeJob. So we 
should be setting permissions from the taskCacheDirectory only.
- writeCommand should ideally check for existence of file before it tries to 
change permissions in the finally clause.
- JvmManagerForType.getTaskForJvm() - "Incase of JVM reuse, tasks returned 
previously launched" - some grammatical mistake here.
- In the kill part, I think it will be nice to add a info level log message 
when we are not doing the kill - both in JVM manager and in LinuxTaskController.
- TaskLog.getLogDir() - Make this getUserLogDir(), and the javadoc need not 
mention TaskControllers. It should be a generic documentation that it returns 
the base location for the user logs.
- mapred-defaults.xml should have the config variable for the task controller 
along with documentation.

Comments on documentation:

- I think we should first describe the use case for the Task controllers are 
trying to solve - as in the requirement to run tasks as a job owners.
- It would be nice to give a little description of how the LinuxTaskController 
works - just saying something like we use a setuid executable, the tasktracker 
uses this exe to launch and kill tasks.
- We should definitely mention that until other JIRAs like H-4491 etc are 
fixed, we open up permissions on the intermediate, localized and log files in 
the Linux TaskController case.
- making the executable a setuid exe is a deployment step. It is currently 
added as a build step.
- The path to the taskcontroller cfg - mention that this should be the path on 
the cluster nodes where the deployment of the taskcontroller.cfg file will 
happen
- We should also mention that the LinuxTaskController is currently supported 
only on Linux. (though it sounds obvious)
- Should we mention about permissions regarding mapred.local.dir and 
hadoop.log.dir (should be 777 and path leading up to them be 755) ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, 
> HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, 
> hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, 
> hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, 
> HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, 
> HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them 
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Reply via email to