[
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068591#comment-17068591
]
ZhaoYang commented on CASSANDRA-15666:
--------------------------------------
{quote}4) At this point, if this was the only file to stream, both nodes are
ready to close the session via maybeCompleted(), but:
a) Node A will call it twice from both the IO thread and the thread at #1,
closing the session and its channels.
b) Node B will attempt to send a CompleteMessage, but will fail because the
session has been closed in the meantime.
{quote}
This can be reproduced by delaying {{maybeComplete}} in {{prepareAsync}} until
requests/transfers are empty at follower side.
{quote}I believe the best fix would be to modify the message exchange so that:
1) Only the "follower" is allowed to send the CompleteMessage.
2) Only the "initiator" is allowed to close the session and its channels after
receiving the CompleteMessage.
{quote}
Above points will definitely make streaming state easier to reason. But they
may not be sufficient, it's still possible to send 2 CompleteMessage by
follower when {{maybeComplete()}} in {{prepareAsync()}} is delayed and race
with {{maybeComplete()}} in {{taskCompleted()}}.
1) Follower sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread
and {{maybeComplete()}} is delayed.
2) Initiator receives it and starts streaming.
3) Follower receives the streamed files and sends {{ReceivedMessage}}.
4) Follower receives all streamed files and triggers {{maybeComplete()}} in
{{taskCompleted}}
5) Follower will send 2 {{CompleteMessage}} because of step 1) and step 4)
I think we also need to enhance synchronization on state transition and
sending CompleteMessage. WDYT?
> Race condition when completing stream sessions
> ----------------------------------------------
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Streaming and Messaging
> Reporter: Sergio Bossa
> Assignee: ZhaoYang
> Priority: Normal
>
> {{StreamSession#prepareAsync()}} executes, as the name implies,
> asynchronously from the IO thread: this opens up for race conditions between
> the sending of the {{PrepareSynAckMessage}} and the call to
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1,
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about,
> which is overall a win anyway.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]