Andrew Hust created CASSANDRA-10874: ---------------------------------------
Summary: running stress with compaction strategy and replication factor fails on read after write Key: CASSANDRA-10874 URL: https://issues.apache.org/jira/browse/CASSANDRA-10874 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Andrew Hust When running a read stress after write stress with a compaction strategy and replication factor matching the node count will fail with an exception. {code} Operation x0 on key(s) [38343433384b34364c30]: Data returned was not validated {code} Example run: {code} ccm create stress -v git:cassandra-3.0 -n 3 -s ccm node1 stress write n=10M -rate threads=300 -schema replication\(factor=3\) compaction\(strategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy\) ccm node1 nodetool flush ccm node1 nodetool compactionstats # check until quiet ccm node1 stress read n=10M -rate threads=300 {code} - This will fail with/out vnodes but will occasionally pass without vnodes. - Changing the read phase to be CL=QUORUM will make it pass. - Removing the replication factor on write will make it pass. - Happens on all compaction strategies So with that in mind I attempted to add a repair after the write phase. This leads to 1 of 2 outcomes. 1: a repair that has a greater than 100% completion, usually stalls after a bit, but have seen it get to >400% progress: {code} id compaction type keyspace table completed total unit progress 2d5344c0-9dc8-11e5-9d5f-4fdec8d76c27 Validation keyspace1 standard1 94722609949 44035292145 bytes 215.11% {code} 2: a repair that has a greatly inflated completed/total value, it will crunch for a bit then lockup: {code} id compaction type keyspace table completed total unit progress 8c4cf7f0-a34a-11e5-a321-777be88c58ae Validation keyspace1 standard1 0 874811100900 bytes 0.00% ❯ du -sh ~/.ccm/stress/node1/ 2.4G ~/.ccm/stress/node1/ ❯ du -sh ~/.ccm/stress 7.1G ~/.ccm/stress {code} This has been reproduced on cassandra-3.0 and cassandra-2.2 both locally and using cstar_perf (links below). A big twist is that cassandra-2.2 will pass the majority of the time. It will complete successfully without the repair 8 out of 10 runs. This can be seen in the cstar_perf links below. cstar_perf runs: http://cstar.datastax.com/tests/id/a8b6af02-a2ce-11e5-bb72-0256e416528f http://cstar.datastax.com/tests/id/a254c572-a2ce-11e5-a8b9-0256e416528f -- This message was sent by Atlassian JIRA (v6.3.4#6332)