ZhaoYang created CASSANDRA-15861:
------------------------------------
Summary: Muting sstable STATS metadata may race with
entire-sstable-streaming(ZCS) causing checksum validation failure
Key: CASSANDRA-15861
URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
Project: Cassandra
Issue Type: Bug
Components: Consistency/Repair, Consistency/Streaming
Reporter: ZhaoYang
Flaky dtest: [test_dead_sync_initiator -
repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
In the above test, it executes "nodetool repair" on node1 and kills node2
during repair. At the end, node3 reports checksum validation failure on sstable
transferred from node1.
{code:java|title=what happened}
1. When repair started on node1, it performs anti-compaction which modifies
sstable's repairAt to 0 and pending repair id to session-id.
2. Then node1 creates {{ComponentManifest}} which contains file lengths to be
transferred to node3.
3. Before node1 actually sends the files to node3, node2 is killed and node1
starts to broadcast repair-failure-message to all participants in
{{CoordinatorSession#fail}}
4. Node1 receives its own repair-failure-message and fails its local repair
sessions at {{LocalSessions#failSession}} which triggers async background
compaction.
5. Node1's background compaction will mutate sstable's repairAt to 0 and
pending repair id to null via
{{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more
in-progress repair.
6. Node1 actually sends the sstable to node3 where the sstable's STATS
component size is different from the original size recorded in the manifest.
7. At the end, node3 reports checksum validation failure when it tries to
mutate sstable level and "isTransient" attribute in
{{CassandraEntireSSTableStreamReader#read}}.
{code}
I believe similar race may happen with level compaction where it may directly
mutate a sstable's level if it doesn't overlap with sstables at next level.
(Note: this isn't a problem in legacy streaming as STATS file length didn't
matter.)
Ideally it will be great to make sstable STATS metadata immutable, just like
other sstable components, so we don't have to worry this special case. For now,
I suggest to use a {{StatsMetadata}} snapshot when initializing
{{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]