[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420895#comment-13420895
 ] 

Shrinivas Joshi commented on MAPREDUCE-2374:
--------------------------------------------

I have tried collecting lsof output before and after the call to 
shExec.execute() and I do not see any process holding on to the handle for 
taskjvm.sh file. I agree with Andy, the problem here does not seem to be 
because of an open FD. I have tried making changes where a call to 
RawLocalFileSystem.sync() gets explicitly called for the underlying output data 
stream after the write to taskjvm.sh happens. I assume that the sync() method 
calls fsync sys call which will ensure that the data is committed to the 
underlying storage.

I have noticed ENOEXEC errors for both successful as well as unsuccessful 
(ETXTBSY) cases of bash -c /path/to/taskjvm.sh calls through strace output. So 
I am not sure if this is the root cause. However, it doesn't hurts to add the 
#!/bin/sh construct. 
One thing I am planning to do is to capture strace output for child processes 
forked from taskjvm.sh script. Also, another thing that I need to try is to 
call sync on the whole directory containing taskjvm.sh.
There is probably something internal at the kernel level that is causing this 
issue. As I mentioned earlier, this problem is much more pronounced on RHEL 6.2 
(stock kernel) compared to Ubuntu 11.04 server.

                
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.1
>
>         Attachments: mapreduce-2374-on-20sec.txt
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
> class swallows all IO exceptions. We're not currently checking for errors, 
> which I'm seeing result in occasional task failures with the message "Text 
> file busy" - assumedly because the close() call is failing silently for some 
> reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to