[
https://issues.apache.org/jira/browse/CASSANDRA-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522978#comment-17522978
]
Brandon Williams edited comment on CASSANDRA-17366 at 4/15/22 11:30 PM:
------------------------------------------------------------------------
These tests set up and then stop a cluster, subsequently starting it with some
combination of seeds, and then nodes in parallel. The problem is that the
setup doesn't guarantee it will wait long enough for the cluster to be
completely established, though C* is fast enough to do so anyway, _almost_ all
of the time. To ensure the setup is complete before shutting down, the nodes
should wait for the CQL interface to become available after the initial
startup. [This dtest
branch|https://github.com/driftx/cassandra-dtest/tree/CASSANDRA-17366] does
that, and here's 400 runs on
[4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/438/workflows/df01fde1-1ff1-4007-bdc2-a9de8e358a16/jobs/5145]
and
[trunk|https://app.circleci.com/pipelines/github/driftx/cassandra/439/workflows/c00617e8-904c-44d7-9f4f-de565e4878cf/jobs/5143].
was (Author: brandon.williams):
These tests setup and then stop a cluster, subsequently starting it with some
combination of seeds, and then nodes in parallel. The problem is that the
setup doesn't guarantee it will wait long enough for the cluster to be
completely established, though C* is fast enough to do so anyway, _almost_ all
of the time. To ensure the setup is complete before shutting down, the nodes
should wait for the CQL interface to become available after the initial
startup. [This dtest
branch|https://github.com/driftx/cassandra-dtest/tree/CASSANDRA-17366] does
that, and here's 400 runs on
[4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/438/workflows/df01fde1-1ff1-4007-bdc2-a9de8e358a16/jobs/5145]
and
[trunk|https://app.circleci.com/pipelines/github/driftx/cassandra/439/workflows/c00617e8-904c-44d7-9f4f-de565e4878cf/jobs/5143].
> Fix flaky test - gossip_test.TestGossip
> ---------------------------------------
>
> Key: CASSANDRA-17366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17366
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Aleksei Zotov
> Assignee: Brandon Williams
> Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> We can see many failures for 4.x branch:
> test_2dc_parallel_startup_one_seed
> ([916|https://ci-cassandra.apache.org/job/Cassandra-trunk/916/testReport/dtest-offheap.gossip_test/TestGossip],
>
> [920|https://ci-cassandra.apache.org/job/Cassandra-trunk/920/testReport/dtest.gossip_test/TestGossip,])
> test_2dc_parallel_startup
> ([929|https://ci-cassandra.apache.org/job/Cassandra-trunk/929/testReport/dtest-novnode.gossip_test/TestGossip],
>
> [931|https://ci-cassandra.apache.org/job/Cassandra-trunk/931/testReport/dtest.gossip_test/TestGossip],
>
> [936|https://ci-cassandra.apache.org/job/Cassandra-trunk/936/testReport/dtest-novnode.gossip_test/TestGossip])
> test_2dc_parallel_startup_one_seed
> ([916|https://ci-cassandra.apache.org/job/Cassandra-trunk/916/testReport/dtest-offheap.gossip_test/TestGossip],
>
> [920|https://ci-cassandra.apache.org/job/Cassandra-trunk/920/testReport/dtest.gossip_test/TestGossip/])
> The error is always the same:
> {code:java}
> Unexpected error found in node logs (see stdout for full details). Errors:
> [ERROR [main] 2022-01-26 10:53:12,866 CassandraDaemon.java:900 - Exception
> encountered during startup
> java.lang.RuntimeException: Didn't receive schemas for all known versions
> within the timeout. Use -Dcassandra.skip_schema_check=true to skip this check.
> at
> org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:1037)
> at
> org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:232)
> at
> org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:180)
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1089)
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:821)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:751)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:417)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:754)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:878),
> ERROR [main] 2022-01-26 10:53:12,866 CassandraDaemon.java:900 - Exception
> encountered during startup
> java.lang.RuntimeException: Didn't receive schemas for all known versions
> within the timeout. Use -Dcassandra.skip_schema_check=true to skip this check.
> at
> org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:1037)
> at
> org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:232)
> at
> org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:180)
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1089)
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:821)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:751)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:417)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:754)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:878)]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]