[
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420895#comment-13420895
]
Shrinivas Joshi commented on MAPREDUCE-2374:
--------------------------------------------
I have tried collecting lsof output before and after the call to
shExec.execute() and I do not see any process holding on to the handle for
taskjvm.sh file. I agree with Andy, the problem here does not seem to be
because of an open FD. I have tried making changes where a call to
RawLocalFileSystem.sync() gets explicitly called for the underlying output data
stream after the write to taskjvm.sh happens. I assume that the sync() method
calls fsync sys call which will ensure that the data is committed to the
underlying storage.
I have noticed ENOEXEC errors for both successful as well as unsuccessful
(ETXTBSY) cases of bash -c /path/to/taskjvm.sh calls through strace output. So
I am not sure if this is the root cause. However, it doesn't hurts to add the
#!/bin/sh construct.
One thing I am planning to do is to capture strace output for child processes
forked from taskjvm.sh script. Also, another thing that I need to try is to
call sync on the whole directory containing taskjvm.sh.
There is probably something internal at the kernel level that is causing this
issue. As I mentioned earlier, this problem is much more pronounced on RHEL 6.2
(stock kernel) compared to Ubuntu 11.04 server.
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
> Key: MAPREDUCE-2374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.22.1
>
> Attachments: mapreduce-2374-on-20sec.txt
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that
> class swallows all IO exceptions. We're not currently checking for errors,
> which I'm seeing result in occasional task failures with the message "Text
> file busy" - assumedly because the close() call is failing silently for some
> reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira