[ 
https://issues.apache.org/jira/browse/HADOOP-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HADOOP-10146:
---------------------------------

    Attachment: HADOOP-10129.patch
                HADOOP-10129.branch-23.patch

This patch uses a hack to workaround the bug.  Synch'ing on the streams before 
closing dovetails with the synch'ed 
{{ProcessPipeInputStream.drainInputStream}}.  The hack is a safe no-op on JDK6 
because it does not drain the streams.

Y! has been using this patch in production for 8 months.  The problem was 
immediately reported to Oracle but a fix will not be available until around 
mid-year so we're providing this workaround to the community.

> Workaround JDK7 Process fd close bug
> ------------------------------------
>
>                 Key: HADOOP-10146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10146
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10129.branch-23.patch, HADOOP-10129.patch
>
>
> JDK7's {{Process}} output streams have an async fd-close race bug.  This 
> manifests as commands run via o.a.h.u.Shell causing threads to hang, OOM, or 
> cause other bizarre behavior.  The NM is likely to encounter the bug under 
> heavy load.
> Specifically, {{ProcessBuilder}}'s {{UNIXProcess}} starts a thread to reap 
> the process and drain stdout/stderr to avoid a lingering zombie process.  A 
> race occurs if the thread using the stream closes it, the underlying fd is 
> recycled/reopened, while the reaper is draining it.  
> {{ProcessPipeInputStream.drainInputStream}}'s will OOM allocating an array if 
> {{in.available()}} returns a huge number, or may wreak havoc by incorrectly 
> draining the fd.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to