[
https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509416#comment-13509416
]
Chris Nauroth commented on HADOOP-8980:
---------------------------------------
Hi, Xuan. I took another look at the {{TestRPC#testErrorMsgForInsecureClient}}
failure, and I still think it's a race condition on the server side.
Specifically, {{Server.Connection#readAndProcess}} calls
{{Server.Connection#initializeAuthContext}} to check authentication, and if it
fails, sets up the "authentication is not enabled" response, and enqueues the
response by calling {{responder.doRespond(authFailedCall)}}. The responder
runs a separate thread that loops, dequeues responses, and writes them in
{{Server.Responder#processResponse}}. Meanwhile, back in the thread of
{{Server.Connection#readAndProcess}}, the authentication failure also causes it
to throw an IOException. This propagates up to {{Server.Reader#doRead}}, which
closes the connection. If we are unlucky enough to have the connection get
closed before the responder thread gets a chance to write the response, then
the client doesn't receive the expected response message, and instead we get
this exception about connection abort. It appears that Windows consistently
schedules threads just right to expose this problem.
It's possible that your experiment to insert a Thread.sleep in the client-side
code interfered with the thread scheduling in such a way that it masked the
problem and made the test pass. It's all running on the same machine, in the
same process.
In order to validate my theory that it's a server-side race condition, I came
up with an experiment that doesn't involve inserting sleep calls that might
interfere with timing. In {{Server.Reader#doRead}}, I commented out the
{{closeConnection(c)}} call. The test consistently passed when I did this, so
I think that validates the theory that it's a server-side problem, and that one
side of the race condition is the connection close.
This might indicate that we need to change the {{Server}} code to send the
"authentication is not enabled" response synchronously, bypassing the
{{Responder}} queue, or finding some other way to chain the connection close
after the response is handled normally from the queue.
> TestRPC fails on Windows
> ------------------------
>
> Key: HADOOP-8980
> URL: https://issues.apache.org/jira/browse/HADOOP-8980
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Affects Versions: trunk-win
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira