Hiro Wakabayashi created CASSANDRA-13863:
--------------------------------------------
Summary: Speculative retry causes read repair even if
read_repair_chance is 0.0.
Key: CASSANDRA-13863
URL: https://issues.apache.org/jira/browse/CASSANDRA-13863
Project: Cassandra
Issue Type: Improvement
Reporter: Hiro Wakabayashi
{{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should
cause no read repair, but read repair happens with speculative retry. I think
{{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should
stop read repair completely because the user wants to stop read repair in some
cases.
{panel:title=Case 1: TWCS users}
The
[documentation|http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=read_repair_chance]
states how to disable read repair.
{quote}While TWCS tries to minimize the impact of comingled data, users should
attempt to avoid this behavior. Specifically, users should avoid queries that
explicitly set the timestamp via CQL USING TIMESTAMP. Additionally, users
should run frequent repairs (which streams data in such a way that it does not
become comingled), and disable background read repair by setting the table’s
read_repair_chance and dclocal_read_repair_chance to 0.
{quote}
{panel}
{panel:title=Case 2. Strict SLA for read latency}
In a peak time, read latency is a key for us but, read repair causes latency
higher than no read repair. We can use anti entropy repair in off peak time for
consistency.
{panel}
Here is my procedure to reproduce the problem.
h3. 1. Create a cluster and set {{hinted_handoff_enabled}} to false.
{noformat}
$ ccm create -v 3.0.14 -n 3 cluster_3.0.14
$ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled:
true/hinted_handoff_enabled: false/'
~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done
$ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:"
~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done
hinted_handoff_enabled: false
hinted_handoff_enabled: false
hinted_handoff_enabled: false
$ ccm start{noformat}
h3. 2. Create a keyspace and a table.
{noformat}
$ ccm node1 cqlsh
DROP KEYSPACE IF EXISTS ks1;
CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE ks1.t1 (
key text PRIMARY KEY,
value blob
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'ALWAYS';
QUIT;
{noformat}
h3. 3. Stop node2 and node3. Insert a row.
{noformat}
$ ccm node3 stop && ccm node2 stop && ccm status
Cluster: 'cluster_3.0.14'
----------------------
node1: UP
node3: DOWN
node2: DOWN
$ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 (key,
value) values ('mmullass', bigintAsBlob(1));"
Current consistency level is ONE.
Now Tracing is enabled
Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501
activity
| timestamp | source | source_elapsed
-----------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 | 0
Parsing insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1));
[SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | 127.0.0.1 | 4323
Preparing statement
[SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 | 5250
Determining replicas for mutation
[SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 11886
Appending to commitlog
[SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12195
Adding to t1 memtable
[SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12392
Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 | 12680
$ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1
where key = 'mmullass';"
Current consistency level is ONE.
Now Tracing is enabled
key | value
----------+--------------------
mmullass | 0x0000000000000001
(1 rows)
Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501
activity |
timestamp | source | source_elapsed
----------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query |
2017-09-13 00:01:06.681000 | 127.0.0.1 | 0
Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] |
2017-09-13 00:01:06.681000 | 127.0.0.1 | 296
Preparing statement [SharedPool-Worker-1] |
2017-09-13 00:01:06.681000 | 127.0.0.1 | 561
Executing single-partition query on t1 [SharedPool-Worker-2] |
2017-09-13 00:01:06.682000 | 127.0.0.1 | 1056
Acquiring sstable references [SharedPool-Worker-2] |
2017-09-13 00:01:06.682000 | 127.0.0.1 | 1142
Merging memtable contents [SharedPool-Worker-2] |
2017-09-13 00:01:06.682000 | 127.0.0.1 | 1206
Read 1 live and 0 tombstone cells [SharedPool-Worker-2] |
2017-09-13 00:01:06.682000 | 127.0.0.1 | 1455
Request complete |
2017-09-13 00:01:06.682794 | 127.0.0.1 | 1794
{noformat}
h3. 4. Start node2 and confirm node2 has no data.
{noformat}
$ ccm node2 start && ccm status
Cluster: 'cluster_3.0.14'
-------------------------
node1: UP
node3: DOWN
node2: UP
$ ccm node2 nodetool flush
$ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db
ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No such
file or directory
{noformat}
h3. 5. Select the row from node2 and read repair works.
{noformat}
$ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1
where key = 'mmullass';"
Current consistency level is ONE.
Now Tracing is enabled
key | value
-----+-------
(0 rows)
Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979
activity
|
timestamp | source | source_elapsed
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query |
2017-09-13 00:02:51.582000 | 127.0.0.2 | 0
Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] |
2017-09-13 00:02:51.583000 | 127.0.0.2 | 1112
Preparing statement [SharedPool-Worker-2] |
2017-09-13 00:02:51.583000 | 127.0.0.2 | 1412
reading data from /127.0.0.1 [SharedPool-Worker-2] |
2017-09-13 00:02:51.584000 | 127.0.0.2 | 2107
Executing single-partition query on t1 [SharedPool-Worker-1] |
2017-09-13 00:02:51.585000 | 127.0.0.2 | 3492
Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] |
2017-09-13 00:02:51.585000 | 127.0.0.2 | 3516
Acquiring sstable references [SharedPool-Worker-1] |
2017-09-13 00:02:51.585000 | 127.0.0.2 | 3595
Merging memtable contents [SharedPool-Worker-1] |
2017-09-13 00:02:51.585001 | 127.0.0.2 | 3673
Read 0 live and 0 tombstone cells [SharedPool-Worker-1] |
2017-09-13 00:02:51.585001 | 127.0.0.2 | 3851
READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] |
2017-09-13 00:02:51.588000 | 127.0.0.1 | 33
Acquiring sstable references [SharedPool-Worker-2] |
2017-09-13 00:02:51.600000 | 127.0.0.1 | 12444
Merging memtable contents [SharedPool-Worker-2] |
2017-09-13 00:02:51.600000 | 127.0.0.1 | 12536
Read 1 live and 0 tombstone cells [SharedPool-Worker-2] |
2017-09-13 00:02:51.600000 | 127.0.0.1 | 12765
Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] |
2017-09-13 00:02:51.600000 | 127.0.0.1 | 12929
Sending
REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] |
2017-09-13 00:02:51.602000 | 127.0.0.1 | 14686
REQUEST_RESPONSE message received from /127.0.0.1
[MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | 127.0.0.2
| --
Processing response from /127.0.0.1 [SharedPool-Worker-3] |
2017-09-13 00:02:51.610000 | 127.0.0.2 | --
Initiating read-repair [SharedPool-Worker-3] |
2017-09-13 00:02:51.610000 | 127.0.0.2 | --
Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373)
(d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b)
[ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 | --
Request complete |
2017-09-13 00:02:51.586892 | 127.0.0.2 | 4892
{noformat}
h3. 6. As a result, node2 has the row.
{noformat}
$ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1
where key = 'mmullass';"
Current consistency level is ONE.
Now Tracing is enabled
key | value
----------+--------------------
mmullass | 0x0000000000000001
(1 rows)
Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979
activity
| timestamp | source | source_elapsed
------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute
CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 0
Parsing select * from ks1.t1 where key = 'mmullass';
[SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 216
Preparing statement
[SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 390
reading data from /127.0.0.1
[SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 808
Executing single-partition query on t1
[SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1041
READ message received from /127.0.0.2
[MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | 127.0.0.1
| 33
Sending READ message to /127.0.0.1
[MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | 127.0.0.2
| 1036
Executing single-partition query on t1
[SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 189
Acquiring sstable references
[SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1113
Acquiring sstable references
[SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 276
Merging memtable contents
[SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1172
Merging memtable contents
[SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 332
REQUEST_RESPONSE message received from /127.0.0.1
[MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | 127.0.0.2
| --
Read 1 live and 0 tombstone cells
[SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 565
Enqueuing response to /127.0.0.2
[SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 648
Sending REQUEST_RESPONSE message to /127.0.0.2
[MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | 127.0.0.1
| 783
Processing response from /127.0.0.1
[SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 | --
Initiating read-repair
[SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 | --
Read 1 live and 0 tombstone cells
[SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 | 10113
Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 | 1830
$ ccm node2 nodetool flush
$ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db
/Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db
$ ~/.ccm/repository/3.0.14/tools/bin/sstabledump
/Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db
-k mmullass
[
{
"partition" : {
"key" : [ "mmullass" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 36,
"liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" },
"cells" : [
{ "name" : "value", "value" : "0000000000000001" }
]
}
]
}
]
{noformat}
In [CASSANDRA-11409|https://issues.apache.org/jira/browse/CASSANDRA-11409],
[~cam1982] commented this was not a bug. So I filed this issue as a improvement.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]