[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421801#comment-13421801
 ] 

Shrinivas Joshi commented on MAPREDUCE-2374:
--------------------------------------------

I have tested Andy's suggestion of not using -c switch. It does resolve the 
issue on our test cluster. Since we are thinking of removing the -c switch, to 
avoid potential data loss issues (from delayed allocation by file systems like 
ext4) I have made some changes so that the IO buffer contents of taskjvm.sh 
file are committed to the underlying storage before shell executor is called. 
The mapreduce-2374-branch-1.patch patch also removes the redundant set 
permission call. It also includes the -c switch change that Andy has suggested. 
If you all find these changes useful please take a look at the attached patch. 
In any case, it would be nice if Andy's patch can be committed. 

Regarding other queries from Andy and Colin: I have been collecting the strace 
output files by annotating "strace -o /path/to/strace.output" strings to the 
same shell executor that calls "bash -c /path/to/taskjvm.sh". With this change, 
the frequency of Text busy errors goes down drastically to the level where it 
may occur once in several runs. I will post strace output as soon I see a 
failing case. Running strace outside of Hadoop code is difficult since it is 
hard to tell which of the JVM processes will have a failing task. Let me know 
if you have any tricks of capturing strace output. 

Andy - could you please clarify about your comment on ProcessBuilder bug? You 
should be able to file a bug on bugs.sun.com.
                
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.1
>
>         Attachments: failed_taskjvmsh.strace, mapreduce-2374-branch-1.patch, 
> mapreduce-2374-on-20sec.txt, mapreduce-2374.txt, mapreduce-2374.txt, 
> successfull_taskjvmsh.strace
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
> class swallows all IO exceptions. We're not currently checking for errors, 
> which I'm seeing result in occasional task failures with the message "Text 
> file busy" - assumedly because the close() call is failing silently for some 
> reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to