TaskRunner logDir race condition leads to crash on job-acl.xml creation
-----------------------------------------------------------------------
Key: MAPREDUCE-2041
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2041
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: task
Affects Versions: 0.22.0
Environment: Linux/x86-64, 32-bit Java, NFS source tree
Reporter: Greg Roelofs
TaskRunner's prepareLogFiles() warns on mkdirs() failures but ignores them. It
also fails even to check the return value of setPermissions(). Either one can
fail (e.g., on NFS, where there appears to be a TOCTOU-style race, except with
C = "creation"), in which case the subsequent creation of job-acl.xml in
writeJobACLs() will also fail, killing the task:
{noformat}
2010-08-26 20:18:10,334 INFO mapred.TaskInProgress
(TaskInProgress.java:updateStatus(591)) - Error from
attempt_20100826201758813_0001_m_000001_0 on
tracker_host2.rack.com:rh45-64/127.0.0.1:35112: java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
Caused by: java.io.FileNotFoundException:
/home/<username>/grid/trunk/hadoop-mapreduce/build/test/logs/userlogs/job_20100826201758813_0001/attempt_20100826201758813_0001_m_000001_0/job-acl.xml
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at org.apache.hadoop.mapred.TaskRunner.writeJobACLs(TaskRunner.java:307)
at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:290)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:199)
{noformat}
This in turn causes TestTrackerBlacklistAcrossJobs to fail sporadically; the
job-acl.xml failure always seems to affect host2 - and to do so more quickly
than the intentional exception on host1 - which triggers an assertion failure
due to the wrong host being job-blacklisted.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.