[
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716620#action_12716620
]
Vinod K V commented on HADOOP-4491:
-----------------------------------
I am summarizing the state-of-the-art local data management on TT
*Localization of task on the TT:*
- Job localization
-- happens once per job
-- creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work
directories recursively
-- downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to
tasktracer/jobcache/jobid/jars/job.jar
- Task localization
-- happens once per task
-- creates task's work directory recursively:
taskTracker/jobcache/jobid/taskid[.cleanup]/work
-- if needed, localizes archives and files for distributed cache to
tasktracker/archive and/or creates symlinks in the task's work directory, and
rewrites job.xml
-- creates mapred.child.tmp directory
-- creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to
redirect its stdout and stderr to this directory
-- creates taskjvm.sh in case of linux task controller
*Intermediate output files*
- All the intermediate files created by tasks run as user in
taskTracker/jobcache/jobid/taskid[.cleanup]/output/ which is recursively
created on demand by the child jvm.
*Intermediate output serving*
- TaskTracker directly reads the map-intermediate output files from
taskTracker/jobcache/jobid/taskid[.cleanup]/output/ and serves it to reduces
via MapOutputServlet
*Task logs' serving*
- Syslogs of tasks are created by the child jvm in
hadoop.log.dir|userlogs/taskid as syslog
- TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet
In summary, the current directory structure follows. Unless otherwise stated,
directories/files are owned by TT but used by both TT and child.
{noformat}
taskTracker
|- archive
|- jobcache
|- jobid
|- work
|- job.xml
|- jars/job.jar
|- taskid[.cleanup]
|- work
|- taskjvm.sh ( created and used by TT)
|- output ( owned by child, used by TT)
|- all intermediate output files ( owned by child, used
by TT)
mapred.child.tmp ( owned and used by child)
hadoop.log.dir|userlogs (owned and used by child)
|- taskid (owned and used by child)
|- stdout ( owned by child, used by TT)
|- stderr ( owned by child, used by TT)
|- syslog ( owned by child, used by TT)
{noformat}
> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4491
> URL: https://issues.apache.org/jira/browse/HADOOP-4491
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: mapred, security
> Reporter: Arun C Murthy
> Assignee: Vinod K V
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.