[
https://issues.apache.org/jira/browse/CASSANDRA-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369166#comment-17369166
]
Jon Meredith commented on CASSANDRA-16759:
------------------------------------------
I've been investigating a few of the test failures and they seem to be related
to the node not waiting to receive an up to date schema and starting bootstrap
with the default schema which does not contain any non-system keyspaces so does
not do any streaming.
In 4.0, MigrationCoordinator is responsible for awaiting having all schema and
it gets told about schema versions from the StorageService.onChange listener.
It only processes the ApplicationState.SCHEMA entries if the endpoint exists in
TokenMetadata.
Endpoints are added to TokenMetadata when StorageService.onJoin handles the
STATUS or STATUS_WITH_PORT application states.
The EnumMap.values() that onJoin iterates over seems to return the application
states in the order they are defined in the enum, so if STATUS is present, it
comes first and all is good.
If STATUS is not present, like when a 4.0 cluster thinks there are no nodes
with a version lower than 4.0 and gossip filters it out, then only the items in
ApplicationState after STATUS_WITH_PORT (currently only SSTABLE_VERSIONS) will
be processed by onChange. Then it takes a subsequent gossip of that
ApplicationState to apply theother states which is making tests racy.
This is all very fiddly and I'm not 100% sure that's the exact sequence, but
there is definitely a change in behavior for when nodes switch to not having
STATUS any more.
I've pushed up a minimal change to onJoin to make it behave [on a
branch|https://github.com/jonmeredith/cassandra/pull/new/marcuse/16759-fix-status-with-port],
with [CircleCI
Here|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=marcuse%2F16759-fix-status-with-port]
A possible cleaner alternative solution would be to sort with a customer key
comparator, but wasn't sure about performance during gossip storms.
> Avoid memoizing the wrong min cluster version during upgrades
> -------------------------------------------------------------
>
> Key: CASSANDRA-16759
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16759
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Priority: Normal
> Fix For: 4.0-rc2
>
>
> CASSANDRA-16525 avoids trying to calculate the cluster min version if
> gossiper is not enabled.
> This makes us memoize the wrong version for up to a minute causing us to send
> 4.0-messages to 3.0 nodes, for example in
> [ColumnFilter|https://github.com/apache/cassandra/blob/05beda90a9206db165a3997a736ecb06f8dc695e/src/java/org/apache/cassandra/db/filter/ColumnFilter.java#L210]
> This was discovered by python upgrade dtests,
> [here|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/993/workflows/2afef6f0-1356-41f6-93dc-5385ac19dca1/jobs/5977/tests#failed-test-0]
> after reverting CASSANDRA-15899 in CASSANDRA-16735
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]