[
https://issues.apache.org/jira/browse/KUDU-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950282#comment-15950282
]
Todd Lipcon commented on KUDU-1963:
-----------------------------------
Dumping state after a bit more debugging:
My current theory is that there is some race where we have the following:
Thread 1: app thread) calls Channel.close()
Thread 2: netty upstream handler thread) handling some negotiation message and
calls Channels.write(...)
These two things aren't synchronized against each other. It seems that
Channel.close() triggers SSLHandler to try to set an SSL "close" message of
some kind, and that "close" message isn't properly synchronized against other
threads trying to call write(). This causes the other thread to get an
inaccurate return code that a "handshake" is in progress, whereas in fact it's
a "close-shake" of some kind. This then propagates up as a "renegotiation" in
the error message back to us.
A few ideas:
- could try to push the Channel.close() call onto the upstream-handler thread
so it's synchronized against other activity
- could try to override SslHandler with our own implementation the prevents it
from trying to do the graceful close
- could just catch SSLException and ignore it if we see that
closedByClient=true.
> Java client logs NPE if a connection is closed by client during negotiation
> ---------------------------------------------------------------------------
>
> Key: KUDU-1963
> URL: https://issues.apache.org/jira/browse/KUDU-1963
> Project: Kudu
> Issue Type: Bug
> Components: client
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
>
> This is noted in KUDU-1894 but I don't know if it's the root cause of the
> ITClient flakiness, so I'm opening a new JIRA for this:
> If the client is closed (or a connection to a TS is closed) while it's in the
> progress of negotiating, it will result in an error stating
> "javax.net.ssl.SSLException: renegotiation attempted by peer; closing the
> connection" followed by an NPE in sendQueuedRpcs().
> This is being triggered by Impala in a stress workload with a high query rate.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)