Andrew Hust created CASSANDRA-10874:
---------------------------------------
Summary: running stress with compaction strategy and replication
factor fails on read after write
Key: CASSANDRA-10874
URL: https://issues.apache.org/jira/browse/CASSANDRA-10874
Project: Cassandra
Issue Type: Bug
Components: Tools
Reporter: Andrew Hust
When running a read stress after write stress with a compaction strategy and
replication factor matching the node count will fail with an exception.
{code}
Operation x0 on key(s) [38343433384b34364c30]: Data returned was not validated
{code}
Example run:
{code}
ccm create stress -v git:cassandra-3.0 -n 3 -s
ccm node1 stress write n=10M -rate threads=300 -schema replication\(factor=3\)
compaction\(strategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy\)
ccm node1 nodetool flush
ccm node1 nodetool compactionstats # check until quiet
ccm node1 stress read n=10M -rate threads=300
{code}
- This will fail with/out vnodes but will occasionally pass without vnodes.
- Changing the read phase to be CL=QUORUM will make it pass.
- Removing the replication factor on write will make it pass.
- Happens on all compaction strategies
So with that in mind I attempted to add a repair after the write phase. This
leads to 1 of 2 outcomes.
1: a repair that has a greater than 100% completion, usually stalls after a
bit, but have seen it get to >400% progress:
{code}
id compaction type keyspace
table completed total unit progress
2d5344c0-9dc8-11e5-9d5f-4fdec8d76c27 Validation keyspace1
standard1 94722609949 44035292145 bytes 215.11%
{code}
2: a repair that has a greatly inflated completed/total value, it will crunch
for a bit then lockup:
{code}
id compaction type keyspace
table completed total unit progress
8c4cf7f0-a34a-11e5-a321-777be88c58ae Validation keyspace1
standard1 0 874811100900 bytes 0.00%
❯ du -sh ~/.ccm/stress/node1/
2.4G ~/.ccm/stress/node1/
❯ du -sh ~/.ccm/stress
7.1G ~/.ccm/stress
{code}
This has been reproduced on cassandra-3.0 and cassandra-2.2 both locally and
using cstar_perf (links below).
A big twist is that cassandra-2.2 will pass the majority of the time. It will
complete successfully without the repair 8 out of 10 runs. This can be seen in
the cstar_perf links below.
cstar_perf runs:
http://cstar.datastax.com/tests/id/a8b6af02-a2ce-11e5-bb72-0256e416528f
http://cstar.datastax.com/tests/id/a254c572-a2ce-11e5-a8b9-0256e416528f
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)