[
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801295#comment-16801295
]
Benedict commented on CASSANDRA-14503:
--------------------------------------
Thanks for the patch! And sorry for taking so long to get back to you. We did
find [some
bugs|https://gist.github.com/belliottsmith/0d12df678d8e9ab06776e29116d56b91],
and other areas of improvement; I've filed CASSANDRA-15066 as a follow-up, and
look forward to hearing your feedback.
> Internode connection management is race-prone
> ---------------------------------------------
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Streaming and Messaging
> Reporter: Sergio Bossa
> Assignee: Jason Brown
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten
> to rely on Netty, but the new implementation in
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the
> former could run into an NPE if the latter nulls the {{channelWriter}} (but
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods:
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when
> handshaking or closing, but there's no guarantee those will be actually
> cancelled (as they might be already running), so they might end up changing
> the connection state concurrently with other methods (i.e. by unexpectedly
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very
> difficult to assess given the current implementation: I would suggest to
> refactor it into a single-thread model, where all connection state changing
> actions are enqueued on a single threaded scheduler, so that state
> transitions can be clearly defined and checked.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]