[
https://issues.apache.org/jira/browse/MAPREDUCE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978551#action_12978551
]
Greg Roelofs commented on MAPREDUCE-2238:
-----------------------------------------
bq. I guess it could be a test timing out right as a setPermissions is done,
interrupting in the middle... but seems pretty unlikely, don't you think?
Yes. I'm guessing it's more subtle than that and lies within the core MR code
or the JVM. The fact that I see it semi-frequently on NFS (that is, more
frequent than Hudson or production) suggests either timing (NFS is slow),
perhaps via an erroneous assumption of synchronous behavior, or else an
erroneous assumption of an infallible system call. It could be other things as
well, of course, but those seem to me like the most probable candidates.
bq. I agree we could work around it for the tests, but I'm nervous whether we
will see this issue crop up in production. Have you guys at Yahoo seen this on
any clusters running secure YDH?
To clarify, I was suggesting working around it in the MR code itself, not
realizing that the Hudson backtrace wasn't using MR code at all. (Well,
apparently.) So I'm not sure where that leaves us, other than trying to fix
the actual set-permissions problem. Seems like no one's basic
deleteRecursive() implementation includes an option to attempt a chmod() before
failing on bad permissions?
Anyway, yes, I _think_ we've seen it in production with 0.20S or later, but it
wasn't while I was on call, so I might be remembering a different issue with
similar symptoms. Sorry...there are lots of interesting failure modes in
Hadoop, and my memory is finite. :-)
> Undeletable build directories
> ------------------------------
>
> Key: MAPREDUCE-2238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2238
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: build, test
> Affects Versions: 0.23.0
> Reporter: Eli Collins
>
> The MR hudson job is failing, looks like it's due to a test chmod'ing a build
> directory so the checkout can't clean the build dir.
> https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/549/console
> Building remotely on hadoop7
> hudson.util.IOException2: remote file operation failed:
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk at
> hudson.remoting.chan...@2545938c:hadoop7
> at hudson.FilePath.act(FilePath.java:749)
> at hudson.FilePath.act(FilePath.java:735)
> at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:589)
> at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:537)
> at hudson.model.AbstractProject.checkout(AbstractProject.java:1116)
> at
> hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
> at hudson.model.Run.run(Run.java:1324)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> at hudson.model.ResourceController.execute(ResourceController.java:88)
> at hudson.model.Executor.run(Executor.java:139)
> Caused by: java.io.IOException: Unable to delete
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/userlogs/job_20101230131139886_0001/attempt_20101230131139886_0001_m_000000_0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.