[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498828#comment-14498828
 ] 

Eric Payne commented on HADOOP-11802:
-------------------------------------

[~cmccabe], Thanks very much for the patch.

I was able to manually verify that the patch fixed the problem we were 
encountering when {{DomainSocketWatcher}}'s main thread was dying. Using the 
same methods as used previously to generate the exception in 
{{DataXceiver#requestShortCircuitShm}}, I was able to verify that the main 
thread of {{DomainSocketWatcher}} remains running.

However, I don't think the unit test is verifying this use case. Here's what I 
did:
1. I patched branch-2 with {{HADOOP-11802.002.patch}}, built it, and ran the 
test for 
{{TestShortCircuitCache#testDataXceiverHandlesRequestShortCircuitShmFailure}}. 
This was successful.
2. I commented out the following code in {{DataXceiver#requestShortCircuitShm}}
{code}
      if ((!success) && releasedSocket) {
        try {
          sock.shutdown();
        } catch (IOException e) {
          LOG.warn("Failed to shut down socket in error handler", e);
        }
      }
{code}
and replaced it with the original code:
{code}
      if ((!success) && (peer == null)) {
        IOUtils.cleanup(null, sock);
      }
{code}
This also succeeded.

> DomainSocketWatcher thread terminates sometimes after there is an I/O error 
> during requestShortCircuitShm
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11802
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Eric Payne
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch
>
>
> In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some 
> errors by closing the {{DomainSocket}}.  However, this violates the invariant 
> that the domain socket should never be closed when it is being managed by the 
> {{DomainSocketWatcher}}.  Instead, we should call {{shutdown}} on the 
> {{DomainSocket}}.  When this bug hits, it terminates the 
> {{DomainSocketWatcher}} thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to