[ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695339#action_12695339 ]
Hemanth Yamijala commented on HADOOP-4490: ------------------------------------------ Some comments: - Use getLocalJobDir in LinuxTaskController.localizeJob - Whenever mkdir or mkdirs fails, we should continue from loops. - Changes in TaskRunner seem unnecessary. - Changes in DistributedCache to pass the baseDir seems unnecessary. Note that localizeCache already takes a CacheStatus object that has the baseDir. - This comment is not incorporated: "Modify TaskController.launchTaskJVM to set up permissions for the log dir and the pid dir associated with that task. This will remove the call to initializeTask from the JvmManager.runChild API." I think this can be done by calling setup*FileAccess from launchTaskJVM - localizeJob can be called initializeJob - In setupTaskCacheFileAccess, we are setting permissions recursively from the job directory. But this is what we do in LinuxTaskController.localizeJob. So we should be setting permissions from the taskCacheDirectory only. - writeCommand should ideally check for existence of file before it tries to change permissions in the finally clause. - JvmManagerForType.getTaskForJvm() - "Incase of JVM reuse, tasks returned previously launched" - some grammatical mistake here. - In the kill part, I think it will be nice to add a info level log message when we are not doing the kill - both in JVM manager and in LinuxTaskController. - TaskLog.getLogDir() - Make this getUserLogDir(), and the javadoc need not mention TaskControllers. It should be a generic documentation that it returns the base location for the user logs. - mapred-defaults.xml should have the config variable for the task controller along with documentation. Comments on documentation: - I think we should first describe the use case for the Task controllers are trying to solve - as in the requirement to run tasks as a job owners. - It would be nice to give a little description of how the LinuxTaskController works - just saying something like we use a setuid executable, the tasktracker uses this exe to launch and kill tasks. - We should definitely mention that until other JIRAs like H-4491 etc are fixed, we open up permissions on the intermediate, localized and log files in the Linux TaskController case. - making the executable a setuid exe is a deployment step. It is currently added as a build step. - The path to the taskcontroller cfg - mention that this should be the path on the cluster nodes where the deployment of the taskcontroller.cfg file will happen - We should also mention that the LinuxTaskController is currently supported only on Linux. (though it sounds obvious) - Should we mention about permissions regarding mapred.local.dir and hadoop.log.dir (should be 777 and path leading up to them be 755) ? > Map and Reduce tasks should run as the user who submitted the job > ----------------------------------------------------------------- > > Key: HADOOP-4490 > URL: https://issues.apache.org/jira/browse/HADOOP-4490 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Hemanth Yamijala > Fix For: 0.21.0 > > Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, > HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, > hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, > hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, > HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, > HADOOP-4490.patch, HADOOP-4490_streaming.patch > > > Currently the TaskTracker spawns the map/reduce tasks, resulting in them > running as the user who started the TaskTracker. > For security and accounting purposes the tasks should be run as the job-owner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.