[
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498828#comment-14498828
]
Eric Payne commented on HADOOP-11802:
-------------------------------------
[~cmccabe], Thanks very much for the patch.
I was able to manually verify that the patch fixed the problem we were
encountering when {{DomainSocketWatcher}}'s main thread was dying. Using the
same methods as used previously to generate the exception in
{{DataXceiver#requestShortCircuitShm}}, I was able to verify that the main
thread of {{DomainSocketWatcher}} remains running.
However, I don't think the unit test is verifying this use case. Here's what I
did:
1. I patched branch-2 with {{HADOOP-11802.002.patch}}, built it, and ran the
test for
{{TestShortCircuitCache#testDataXceiverHandlesRequestShortCircuitShmFailure}}.
This was successful.
2. I commented out the following code in {{DataXceiver#requestShortCircuitShm}}
{code}
if ((!success) && releasedSocket) {
try {
sock.shutdown();
} catch (IOException e) {
LOG.warn("Failed to shut down socket in error handler", e);
}
}
{code}
and replaced it with the original code:
{code}
if ((!success) && (peer == null)) {
IOUtils.cleanup(null, sock);
}
{code}
This also succeeded.
> DomainSocketWatcher thread terminates sometimes after there is an I/O error
> during requestShortCircuitShm
> ---------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-11802
> URL: https://issues.apache.org/jira/browse/HADOOP-11802
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Eric Payne
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch
>
>
> In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some
> errors by closing the {{DomainSocket}}. However, this violates the invariant
> that the domain socket should never be closed when it is being managed by the
> {{DomainSocketWatcher}}. Instead, we should call {{shutdown}} on the
> {{DomainSocket}}. When this bug hits, it terminates the
> {{DomainSocketWatcher}} thread.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)