[
https://issues.apache.org/jira/browse/MAPREDUCE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983267#action_12983267
]
Todd Lipcon commented on MAPREDUCE-2238:
----------------------------------------
Spent some time adding logging and looping the tests to figure out this
problem. I think I have it cracked.
The issue is not multiple threads calling setPermission() on the same process,
but rather a case where one thread is calling setPermission on the *parent*
directory of a file where another thread (actually another entire process) is
calling setPermission.
In particular, these two invocations race:
2011-01-18 09:00:40,958 INFO tasktracker.Localizer
(Localizer.java:setPermissions(129)) - Thread[TaskLauncher for MAP
tasks,5,main]: About to set permissions on
/data/1/todd/cdh/repos/cdh3/hadoop-0.20/build/test/logs/userlogs/job_20110118090037816_0001
java.lang.Exception
at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer$PermissionsHandler.setPermissions(Localizer.java:129)
at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer.initializeJobLogDir(Localizer.java:429)
at
org.apache.hadoop.mapred.TaskTracker.initializeJobLogDir(TaskTracker.java:1072)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:969)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2209)
2011-01-18 09:00:40,985 INFO tasktracker.Localizer
(Localizer.java:setPermissions(129)) - Thread[Thread-213,5,main]: About to set
permissions on
/data/1/todd/cdh/repos/cdh3/hadoop-0.20/build/test/logs/userlogs/job_20110118090037816_0001/attempt_20110118090037816_0001_m_000005_0
java.lang.Exception
at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer$PermissionsHandler.setPermissions(Localizer.java:129)
at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:285)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:198)
The above traces are from an 0.20 branch but I imagine it's the same deal on
trunk.
The issue is that the top invocation flips the job_<id> directory to 000
momentarily. During that time, the stat/chmod calls for the attempt directory
fail with EACCES, which can leave the attempt directory with the wrong
permissions. I have strace output which shows this as well.
I think we should do away with this Java API nonsense altogether, link in a
normal chmod call, and use fork by default when native isn't available.
> Undeletable build directories
> ------------------------------
>
> Key: MAPREDUCE-2238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2238
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: build, test
> Affects Versions: 0.23.0
> Reporter: Eli Collins
> Attachments: mapreduce-2238.txt
>
>
> The MR hudson job is failing, looks like it's due to a test chmod'ing a build
> directory so the checkout can't clean the build dir.
> https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/549/console
> Building remotely on hadoop7
> hudson.util.IOException2: remote file operation failed:
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk at
> hudson.remoting.Channel@2545938c:hadoop7
> at hudson.FilePath.act(FilePath.java:749)
> at hudson.FilePath.act(FilePath.java:735)
> at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:589)
> at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:537)
> at hudson.model.AbstractProject.checkout(AbstractProject.java:1116)
> at
> hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
> at hudson.model.Run.run(Run.java:1324)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> at hudson.model.ResourceController.execute(ResourceController.java:88)
> at hudson.model.Executor.run(Executor.java:139)
> Caused by: java.io.IOException: Unable to delete
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/userlogs/job_20101230131139886_0001/attempt_20101230131139886_0001_m_000000_0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.