[
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sam liu updated MAPREDUCE-4490:
-------------------------------
Attachment: MAPREDUCE-4490.patch
Update patch to remove create_attempt_directories() invocation from
task-controller.c#run_task_as_user(). That invocation is unnecessary because
task-controller.c#initialize_task() always does same work.
> JVM reuse is incompatible with LinuxTaskController (and therefore
> incompatible with Security)
> ---------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task-controller, tasktracker
> Affects Versions: 0.20.205.0, 1.0.3, 1.2.1
> Reporter: George Datskos
> Assignee: sam liu
> Priority: Critical
> Labels: patch
> Fix For: 1.2.1
>
> Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks >
> 1) with more map tasks in a job than there are map slots in the cluster will
> result in immediate task failures for the second task in each JVM (and then
> the JVM exits). We have investigated this bug and the root cause is as
> follows. When using LinuxTaskController, the userlog directory for a task
> attempt (../userlogs/job/task-attempt) is created only on the first
> invocation (when the JVM is launched) because userlogs directories are
> created by the task-controller binary which only runs *once* per JVM.
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting
> logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM
> as that of the first task
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes
> smoothly. Then Task27 starts. The directory
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0
> is never created so when mapred.Child tries to write the log.index file for
> Task27, it fails with ENOENT because the
> attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore,
> the second task in each JVM is guaranteed to fail (and then the JVM exits)
> every time when using LinuxTaskController. Note that this problem does not
> occur when using the DefaultTaskController because the userlogs directories
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories]
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to
> create attempt directories. Call that command, with ShellCommandExecutor, in
> the LinuxTaskController#createLogDir method
--
This message was sent by Atlassian JIRA
(v6.1#6144)