[
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420877#comment-13420877
]
Andy Isaacson commented on MAPREDUCE-2374:
------------------------------------------
Supposing that SElinux is not involved, here's some analysis under traditional
UNIX semantics. (I'll post another comment relevant to SElinux.)
The ETXTBSY error code from execve(2) happens when the executable (in this
case, a shell script) is still opened for write by another process. Reading
the kernel code there does not appear to be any chance for a "slight delay";
when the writer calls close() the atomic {{inode->i_readcount}} is decremented,
and {{open_exec()}} correctly tests the atomic against zero.
So I strongly suspect that the writer does, in fact, still have the
filedescriptor open for write when the execve happens.
I don't see how that's possible, though -- we clearly call {{w.close()}} in the
{{finally}} clause of {{writeCommand}}, and I don't see anywhere that we leak a
{{FileDescriptor}} to prevent {{FileOutputStream#close}} from triggering the
underlying {{close(2)}}.
But...
We can avoid the {{ETXTBSY}} by avoiding the {{execve}}. If I'm reading
{{launchTask}} correctly, the script we're execing isn't even a valid shell
script anyways -- it's just a sequence of shell commands, without the leading
"#!/bin/sh" header. By running it {{bash -c "/path/to/script"}} we're relying
on the ancient pre-Bourne shell script convention that if execve() fails with
{{ENOEXEC}}, the shell tries to interpret the file as a script.
Instead, we can ask bash to directly run the script as a script by running
{{bash "/path/to/script"}} leaving out the {{-c}}. This avoids the code path
that triggers the ETXTBSY failure and is slightly less reliant on random
backwards compatibility kludges. And it doesn't break if we do have the
{{#!/bin/sh}} line since that's just a comment.
Now, suppose the undiscovered but hypothesized race condition in
{{writeCommand}} does exist, and affects the {{write}} as well as the
{{close}}. Then removing {{-c}} does not remove the race, and when we lose the
race the failure will probably be more silent. The command file might not be
completely written and running the script might fail.
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
> Key: MAPREDUCE-2374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.22.1
>
> Attachments: mapreduce-2374-on-20sec.txt
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that
> class swallows all IO exceptions. We're not currently checking for errors,
> which I'm seeing result in occasional task failures with the message "Text
> file busy" - assumedly because the close() call is failing silently for some
> reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira