[
https://issues.apache.org/jira/browse/HADOOP-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daryn Sharp updated HADOOP-10146:
---------------------------------
Attachment: HADOOP-10129.patch
HADOOP-10129.branch-23.patch
This patch uses a hack to workaround the bug. Synch'ing on the streams before
closing dovetails with the synch'ed
{{ProcessPipeInputStream.drainInputStream}}. The hack is a safe no-op on JDK6
because it does not drain the streams.
Y! has been using this patch in production for 8 months. The problem was
immediately reported to Oracle but a fix will not be available until around
mid-year so we're providing this workaround to the community.
> Workaround JDK7 Process fd close bug
> ------------------------------------
>
> Key: HADOOP-10146
> URL: https://issues.apache.org/jira/browse/HADOOP-10146
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Priority: Critical
> Attachments: HADOOP-10129.branch-23.patch, HADOOP-10129.patch
>
>
> JDK7's {{Process}} output streams have an async fd-close race bug. This
> manifests as commands run via o.a.h.u.Shell causing threads to hang, OOM, or
> cause other bizarre behavior. The NM is likely to encounter the bug under
> heavy load.
> Specifically, {{ProcessBuilder}}'s {{UNIXProcess}} starts a thread to reap
> the process and drain stdout/stderr to avoid a lingering zombie process. A
> race occurs if the thread using the stream closes it, the underlying fd is
> recycled/reopened, while the reaper is draining it.
> {{ProcessPipeInputStream.drainInputStream}}'s will OOM allocating an array if
> {{in.available()}} returns a huge number, or may wreak havoc by incorrectly
> draining the fd.
--
This message was sent by Atlassian JIRA
(v6.1#6144)