[
https://issues.apache.org/jira/browse/CASSANDRA-20659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953549#comment-17953549
]
David Capwell commented on CASSANDRA-20659:
-------------------------------------------
Failed Builds:
||Build||Result||Reason||
| jvm-dtest-simulator | unknown | No test results found
|
| jvm-upgrade-dtests | fail | Test
org.apache.cassandra.distributed.upgrade.CompactStoragePagingWithProtocolV30Test::testPagingWit
|
| jvm8-dtests | fail | Test
org.apache.cassandra.distributed.test.RepairTest::testForcedNormalRepairWithOneNodeDown
had an |
| jvm8-utests | fail | Test
org.apache.cassandra.cql3.ViewComplexTTLTest::testUpdateColumnInViewPKWithTTLWithoutFlush[1]
ha |
| python-dtests | fail | Test
auth_test.TestNetworkAuth::auth_test.py::TestNetworkAuth::test_revoked_login
had an error,Test |
||Repo||Branch||Parent Branch||SHA||Status||
|https://github.com/dcapwell/cassandra.git|commit_remote_branch/CASSANDRA-20659-cassandra-4.0-42C17482-740E-40E5-A5E2-16F925410B1E|cassandra-4.0|7e04a922c3203c0970e220643b117f5e3f1f8f5f|Unstable
Failed Builds:
||Build||Result||Reason||
| jvm-dtest-simulator | unknown | No test results found
|
| jvm-upgrade-dtests | fail | Test
org.apache.cassandra.distributed.upgrade.CompactStoragePagingWithProtocolV30Test::testPagingWit
|
| jvm8-dtests | fail | Test
org.apache.cassandra.distributed.test.RepairTest::testForcedNormalRepairWithOneNodeDown
had an |
| jvm8-utests | fail | Test
org.apache.cassandra.cql3.ViewComplexTTLTest::testUpdateColumnInViewPKWithTTLWithoutFlush[1]
ha |
| python-dtests | fail | Test
auth_test.TestNetworkAuth::auth_test.py::TestNetworkAuth::test_revoked_login
had an error,Test |
||Repo||Branch||Parent Branch||SHA||Status||
|https://github.com/dcapwell/cassandra.git|commit_remote_branch/CASSANDRA-20659-cassandra-4.1-42C17482-740E-40E5-A5E2-16F925410B1E|cassandra-4.1|8cfc452b9a77e89ad06563cfdf25f150af524d9c|Unstable
Failed Builds:
||Build||Result||Reason||
| jvm11-dtests | fail | Test
org.apache.cassandra.distributed.test.NetstatsBootstrapWithEntireSSTablesCompressionStreamingTe
|
| jvm11-utests-long | fail | Test
org.apache.cassandra.cql3.ViewLongTest::testExpiredLivenessInfoWithDefaultTTLWithFlush[0]
faile |
| python-upgrade-dtests | fail | Test
upgrade_tests.cql_tests.cls::upgrade_tests.py::cql_tests::test_empty_in had an
error,Test upgra |
||Repo||Branch||Parent Branch||SHA||Status||
|https://github.com/dcapwell/cassandra.git|commit_remote_branch/CASSANDRA-20659-cassandra-5.0-42C17482-740E-40E5-A5E2-16F925410B1E|cassandra-5.0|c2ad8e703375af6e7848c8a52592cb3df5f7a7b3|Unstable
Failed Builds:
||Build||Result||Reason||
| jvm11-utests | fail | Test
org.apache.cassandra.io.sstable.SSTableReaderTest::testSpannedIndexPositions-cassandra.testtag_
|
| jvm17-utests | fail | Test
org.apache.cassandra.io.sstable.SSTableReaderTest::testSpannedIndexPositions-cassandra.testtag_
|
| python-dtests | fail | Test
gossip_test.TestGossip::gossip_test.py::TestGossip::test_2dc_parallel_startup
failed,Test cqlsh |
| python-dtests-large | fail | Test
consistency_test.TestAccuracy::consistency_test.py::TestAccuracy::test_network_topology_strateg
|
| python-upgrade-dtests | unknown | No test results found
|
| python-upgrade-dtests-large | unknown | No test results found
|
||Repo||Branch||Parent Branch||SHA||Status||
|https://github.com/dcapwell/cassandra.git|commit_remote_branch/CASSANDRA-20659-trunk-42C17482-740E-40E5-A5E2-16F925410B1E|trunk|f2fdc52c5b8c900b350ff5f4c81dd8a33df7530b|Unstable
Failed Builds:
||Build||Result||Reason||
| jvm11-dtests-fuzz | unknown | No test results found
|
| jvm11-utests | fail | Test
org.apache.cassandra.io.sstable.SSTableReaderTest::testSpannedIndexPositions-cassandra.testtag_
|
| jvm17-dtests | fail | Test
org.apache.cassandra.distributed.test.accord.AccordIncrementalRepairTest::onlyAccordWithForceTe
|
| jvm17-utests | fail | Test
org.apache.cassandra.io.sstable.SSTableReaderTest::testSpannedIndexPositions-cassandra.testtag_
|
| jvm17-utests-long | fail | Test
org.apache.cassandra.cql3.ViewLongTest::testExpiredLivenessInfoWithDefaultTTLWithFlush[2]-_jdk1
|
| python-dtests | fail | Test
gossip_test.TestGossip::gossip_test.py::TestGossip::test_2dc_parallel_startup_one_seed
failed,T |
| python-upgrade-dtests | fail | Test
upgrade_tests.upgrade_through_versions_test.TestUpgrade_current_5_0_x_To_indev_trunk::upgrade_t
|
> Gossip doesn't converge due to race condition when updating EndpointStates
> multiple fields
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-20659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20659
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 4.0.18, 4.1.10, 5.0.5, 5.1
>
> Attachments:
> ci_summary-cassandra-4.0-7e04a922c3203c0970e220643b117f5e3f1f8f5f.html,
> ci_summary-cassandra-4.1-8cfc452b9a77e89ad06563cfdf25f150af524d9c.html,
> ci_summary-cassandra-5.0-c2ad8e703375af6e7848c8a52592cb3df5f7a7b3.html,
> ci_summary-trunk-f2fdc52c5b8c900b350ff5f4c81dd8a33df7530b.html,
> result_details-cassandra-4.0-7e04a922c3203c0970e220643b117f5e3f1f8f5f.tar.gz,
> result_details-cassandra-5.0-c2ad8e703375af6e7848c8a52592cb3df5f7a7b3.tar.gz,
> result_details-trunk-f2fdc52c5b8c900b350ff5f4c81dd8a33df7530b.tar.gz
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The issue seen is during shrinks or token moves the cluster gets into a state
> where some of the nodes never converge and see the latest STATUS state for
> the changed peers.
> In testing this it was found that:
> 1) org.apache.cassandra.gms.Gossiper#applyStateLocally expects to run in a
> single thread, so doesn't take any locks
> 2) org.apache.cassandra.gms.Gossiper.GossipTask runs in another thread and
> uses a taskLock to avoid sending partial state
> 3) org.apache.cassandra.gms.Gossiper#applyNewStates gets called when the
> generation matches, and tries to apply the state sequentially.
> The theory (and test) is
> 1) localState.setHeartBeatState(remoteState.getHeartBeatState()); runs
> 2) something (gossip or paxos) read the state
> 3) localState.addApplicationStates(updatedStates); updates the state
> the "something" in step 2 sends around the heartbeat which cause others to
> see a higher max version, so the delta logic won't see the mutations done in
> step 3
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]