[
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079258#comment-17079258
]
Sergio Bossa commented on CASSANDRA-15667:
------------------------------------------
Thanks [~e.dimitrova] for chiming in.
{quote}From what I recall the bootstrap was sometimes completing too fast
before the streaming is really interrupted from the byteman code and we didn't
really have an instrument to control that.
{quote}
I went through the "resumable bootstrap" test, and unfortunately I don't see
how the bootstrap could ever complete before the byteman script is invoked:
this is because such script [makes node 1 fail before it starts to stream
files|[https://github.com/ekaterinadimitrova2/cassandra-dtest/blob/b56887d67c353d6d69cd60cfd74859405fa37685/byteman/4.0/stream_failure.btm#L10]],
which means there's no way for node 3 to finish bootstrapping before it
received all files from both nodes, which will never happen due to said script
causing node 1 to fail.
So why did the test fail?
I believe that's because of this issue: in other words, node 3 was correctly
seeing its streaming session completed (after node 1 finished streaming with an
error) but *not* failed; this is because the "completed" state is read through
the actual session state, while the "failed" state is read through the
{{SessionInfo}} state, which is what we're fixing here.
That said, I would propose to still re-introduce the original
{{resumable_bootstrap_test}}, because it's an important enough feature to
deserve its own test, and it uses 3 nodes which increases the chances of
detecting errors/races.
Thoughts?
> StreamResultFuture check for completeness is inconsistent, leading to races
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Streaming and Messaging
> Reporter: Sergio Bossa
> Assignee: Massimiliano Tomassi
> Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are
> completed, but then accesses each session state via
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the
> former relies on the actual {{StreamSession}} state, while the latter on the
> {{SessionInfo}} state, and the two are concurrently updated with no
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc
> [~e.dimitrova].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]