[
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421801#comment-13421801
]
Shrinivas Joshi commented on MAPREDUCE-2374:
--------------------------------------------
I have tested Andy's suggestion of not using -c switch. It does resolve the
issue on our test cluster. Since we are thinking of removing the -c switch, to
avoid potential data loss issues (from delayed allocation by file systems like
ext4) I have made some changes so that the IO buffer contents of taskjvm.sh
file are committed to the underlying storage before shell executor is called.
The mapreduce-2374-branch-1.patch patch also removes the redundant set
permission call. It also includes the -c switch change that Andy has suggested.
If you all find these changes useful please take a look at the attached patch.
In any case, it would be nice if Andy's patch can be committed.
Regarding other queries from Andy and Colin: I have been collecting the strace
output files by annotating "strace -o /path/to/strace.output" strings to the
same shell executor that calls "bash -c /path/to/taskjvm.sh". With this change,
the frequency of Text busy errors goes down drastically to the level where it
may occur once in several runs. I will post strace output as soon I see a
failing case. Running strace outside of Hadoop code is difficult since it is
hard to tell which of the JVM processes will have a failing task. Let me know
if you have any tricks of capturing strace output.
Andy - could you please clarify about your comment on ProcessBuilder bug? You
should be able to file a bug on bugs.sun.com.
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
> Key: MAPREDUCE-2374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.22.1
>
> Attachments: failed_taskjvmsh.strace, mapreduce-2374-branch-1.patch,
> mapreduce-2374-on-20sec.txt, mapreduce-2374.txt, mapreduce-2374.txt,
> successfull_taskjvmsh.strace
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that
> class swallows all IO exceptions. We're not currently checking for errors,
> which I'm seeing result in occasional task failures with the message "Text
> file busy" - assumedly because the close() call is failing silently for some
> reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira