[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421922#comment-13421922
 ] 

Andy Isaacson commented on MAPREDUCE-2374:
------------------------------------------

bq. I have tested Andy's suggestion of not using -c switch. It does resolve the 
issue on our test cluster.
Thanks for testing!  This is great news.

bq. to avoid potential data loss issues (from delayed allocation by file 
systems like ext4) I have made some changes so that the IO buffer contents of 
taskjvm.sh file are committed to the underlying storage before shell executor 
is called.

I'm a bit confused why you're concerned about crash consistency here.  AFAIK, 
ext4 delayed allocation is *completely* invisible to application code, unless 
the machine crashes and you're recovering afterwards.

(OK, that's not quite true since you could use {{filefrag}} to find out where 
the allocations are, or maybe you could use hires timers to notice seek-induced 
IO timing discontinuities, but those are not relevant to this discussion.)

Since the taskjvm.sh script is just used immediately and not reused across a 
kernel crash, why do you care that the IO buffer is synced to the stable 
storage?

Looking at mapreduce-2374-branch-1.patch, I see that you've made two related 
changes, one getting rid of the {{BufferedOutputStream}} from 
{{RawLocalFileSystem#create}} and another adding calls to 
{{FSDataOutputStream#flush}} and {{#sync}}.  I don't see how either of those 
changes can make much of a difference given that we call {{w.close}} 8 lines 
down in the finally block.

bq. with strace the frequency of Text busy errors goes down drastically
Not too surprising; to get ETXTBSY we have to get to the relevant {{execve}} 
before the forked process that's holding the fd exits.  Adding a strace before 
that slows down the shell and adds another child process into the scheduling 
mix, making the race harder to win.
                
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.1
>
>         Attachments: failed_taskjvmsh.strace, mapreduce-2374-branch-1.patch, 
> mapreduce-2374-on-20sec.txt, mapreduce-2374.txt, mapreduce-2374.txt, 
> successfull_taskjvmsh.strace
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
> class swallows all IO exceptions. We're not currently checking for errors, 
> which I'm seeing result in occasional task failures with the message "Text 
> file busy" - assumedly because the close() call is failing silently for some 
> reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to