[
https://issues.apache.org/jira/browse/CASSANDRA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Eriksson updated CASSANDRA-16094:
----------------------------------------
Test and Documentation Plan: new jvm dtest, cci run
Status: Patch Available (was: Open)
This failure looks to be caused by a gossip shutdown race.
During {{drain()}} we send out {{GOSSIP_SHUTDOWN}} messages to all live
endpoints which marks the node down on all other nodes. Sometimes a node can
get a {{GossipDigestAck}} from the shutting down node after the GOSSIP_SHUTDOWN
message, then we will send an ECHO_REQ to the shutting down node, which replies
and the node gets marked as UP again.
In this case it makes the mutation that is supposed to only go to node1 get
queued up and applied when the node gets back which makes us not get a digest
mismatch and no repair data tracking warning.
[Patch|https://github.com/krummas/cassandra/commits/marcuse/16094] to avoid
replying to an {{ECHO_REQ}} if we are shutting down.
[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/575/workflows/242a0b75-e3a1-4e29-84d1-3353a32d4096]
> Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-16094
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16094
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: Caleb Rackliffe
> Assignee: Marcus Eriksson
> Priority: Normal
> Labels: dtest, incremental_repair, repair
> Fix For: 4.0-beta
>
>
> We have two recent failures for this test on trunk:
> 1.)
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests
> (CASSANDRA-15909)
> 2.)
> https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests
> (CASSANDRA-15379)
> The test expects there to be mismatches and then read repair executed on a
> following SELECT, but either those mismatches aren’t there, read repair isn’t
> happening, or both.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]