[ 
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716620#action_12716620
 ] 

Vinod K V commented on HADOOP-4491:
-----------------------------------

I am summarizing the state-of-the-art local data management on TT

*Localization of task on the TT:*

 - Job localization
   -- happens once per job
   -- creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work 
directories recursively
   -- downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to 
tasktracer/jobcache/jobid/jars/job.jar

 - Task localization
   -- happens once per task
   -- creates task's work directory recursively: 
taskTracker/jobcache/jobid/taskid[.cleanup]/work
   -- if needed, localizes archives and files for distributed cache to 
tasktracker/archive and/or creates symlinks in the task's work directory, and 
rewrites job.xml
   -- creates mapred.child.tmp directory
   -- creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to 
redirect its stdout and stderr to this directory
   -- creates taskjvm.sh in case of linux task controller

*Intermediate output files*
 - All the intermediate files created by tasks run as user in 
taskTracker/jobcache/jobid/taskid[.cleanup]/output/ which is recursively 
created on demand by the child jvm.

*Intermediate output serving*
 - TaskTracker directly reads the map-intermediate output files from 
taskTracker/jobcache/jobid/taskid[.cleanup]/output/ and serves it to reduces 
via MapOutputServlet

*Task logs' serving*
 - Syslogs of tasks are created by the child jvm in 
hadoop.log.dir|userlogs/taskid as syslog
 - TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet

In summary, the current directory structure follows. Unless otherwise stated, 
directories/files are owned by TT but used by both TT and child.
{noformat}
taskTracker
    |- archive
    |- jobcache
        |- jobid
            |- work
            |- job.xml
            |- jars/job.jar
            |- taskid[.cleanup]
                |- work
                    |- taskjvm.sh ( created and used by TT)
                    |- output ( owned by child, used by TT)
                        |- all intermediate output files ( owned by child, used 
by TT)

mapred.child.tmp ( owned and used by child)

hadoop.log.dir|userlogs (owned and used by child)
    |- taskid       (owned and used by child)
        |- stdout   ( owned by child, used by TT)
        |- stderr   ( owned by child, used by TT)
        |- syslog  ( owned by child, used by TT)
{noformat}


> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to