[ https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716620#action_12716620 ]
Vinod K V commented on HADOOP-4491: ----------------------------------- I am summarizing the state-of-the-art local data management on TT *Localization of task on the TT:* - Job localization -- happens once per job -- creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work directories recursively -- downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to tasktracer/jobcache/jobid/jars/job.jar - Task localization -- happens once per task -- creates task's work directory recursively: taskTracker/jobcache/jobid/taskid[.cleanup]/work -- if needed, localizes archives and files for distributed cache to tasktracker/archive and/or creates symlinks in the task's work directory, and rewrites job.xml -- creates mapred.child.tmp directory -- creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to redirect its stdout and stderr to this directory -- creates taskjvm.sh in case of linux task controller *Intermediate output files* - All the intermediate files created by tasks run as user in taskTracker/jobcache/jobid/taskid[.cleanup]/output/ which is recursively created on demand by the child jvm. *Intermediate output serving* - TaskTracker directly reads the map-intermediate output files from taskTracker/jobcache/jobid/taskid[.cleanup]/output/ and serves it to reduces via MapOutputServlet *Task logs' serving* - Syslogs of tasks are created by the child jvm in hadoop.log.dir|userlogs/taskid as syslog - TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet In summary, the current directory structure follows. Unless otherwise stated, directories/files are owned by TT but used by both TT and child. {noformat} taskTracker |- archive |- jobcache |- jobid |- work |- job.xml |- jars/job.jar |- taskid[.cleanup] |- work |- taskjvm.sh ( created and used by TT) |- output ( owned by child, used by TT) |- all intermediate output files ( owned by child, used by TT) mapred.child.tmp ( owned and used by child) hadoop.log.dir|userlogs (owned and used by child) |- taskid (owned and used by child) |- stdout ( owned by child, used by TT) |- stderr ( owned by child, used by TT) |- syslog ( owned by child, used by TT) {noformat} > Per-job local data on the TaskTracker node should have right access-control > --------------------------------------------------------------------------- > > Key: HADOOP-4491 > URL: https://issues.apache.org/jira/browse/HADOOP-4491 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Vinod K V > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.