[ https://issues.apache.org/jira/browse/CASSANDRA-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hiro Wakabayashi updated CASSANDRA-13863: ----------------------------------------- Description: {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should cause no read repair, but read repair happens with speculative retry. I think {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should stop read repair completely because the user wants to stop read repair in some cases. {panel:title=Case 1: TWCS users} The [documentation|http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=read_repair_chance] states how to disable read repair. {quote}While TWCS tries to minimize the impact of comingled data, users should attempt to avoid this behavior. Specifically, users should avoid queries that explicitly set the timestamp via CQL USING TIMESTAMP. Additionally, users should run frequent repairs (which streams data in such a way that it does not become comingled), and disable background read repair by setting the table’s read_repair_chance and dclocal_read_repair_chance to 0. {quote} {panel} {panel:title=Case 2. Strict SLA for read latency} In a peak time, read latency is a key for us but, read repair causes latency higher than no read repair. We can use anti entropy repair in off peak time for consistency. {panel} Here is my procedure to reproduce the problem. h3. 1. Create a cluster and set {{hinted_handoff_enabled}} to false. {noformat} $ ccm create -v 3.0.14 -n 3 cluster_3.0.14 $ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled: true/hinted_handoff_enabled: false/' ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done $ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:" ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done hinted_handoff_enabled: false hinted_handoff_enabled: false hinted_handoff_enabled: false $ ccm start{noformat} h3. 2. Create a keyspace and a table. {noformat} $ ccm node1 cqlsh DROP KEYSPACE IF EXISTS ks1; CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; CREATE TABLE ks1.t1 ( key text PRIMARY KEY, value blob ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'ALWAYS'; QUIT; {noformat} h3. 3. Stop node2 and node3. Insert a row. {noformat} $ ccm node3 stop && ccm node2 stop && ccm status Cluster: 'cluster_3.0.14' ---------------------- node1: UP node3: DOWN node2: DOWN $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1));" Current consistency level is ONE. Now Tracing is enabled Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501 activity | timestamp | source | source_elapsed -----------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 | 0 Parsing insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1)); [SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | 127.0.0.1 | 4323 Preparing statement [SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 | 5250 Determining replicas for mutation [SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 11886 Appending to commitlog [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12195 Adding to t1 memtable [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12392 Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 | 12680 $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value ----------+-------------------- mmullass | 0x0000000000000001 (1 rows) Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501 activity | timestamp | source | source_elapsed ----------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 296 Preparing statement [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 561 Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1056 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1142 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1206 Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1455 Request complete | 2017-09-13 00:01:06.682794 | 127.0.0.1 | 1794 {noformat} h3. 4. Start node2 and confirm node2 has no data. {noformat} $ ccm node2 start && ccm status Cluster: 'cluster_3.0.14' ------------------------- node1: UP node3: DOWN node2: UP $ ccm node2 nodetool flush $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No such file or directory {noformat} h3. 5. Select the row from node2 and read repair works. {noformat} $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value -----+------- (0 rows) Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979 activity | timestamp | source | source_elapsed -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:02:51.582000 | 127.0.0.2 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1112 Preparing statement [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1412 reading data from /127.0.0.1 [SharedPool-Worker-2] | 2017-09-13 00:02:51.584000 | 127.0.0.2 | 2107 Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3492 Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3516 Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3595 Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3673 Read 0 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3851 READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:02:51.588000 | 127.0.0.1 | 33 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12444 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12536 Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12765 Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12929 Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:02:51.602000 | 127.0.0.1 | 14686 REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | 127.0.0.2 | -- Processing response from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- Initiating read-repair [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373) (d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b) [ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 | -- Request complete | 2017-09-13 00:02:51.586892 | 127.0.0.2 | 4892 {noformat} h3. 6. As a result, node2 has the row. {noformat} $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value ----------+-------------------- mmullass | 0x0000000000000001 (1 rows) Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979 activity | timestamp | source | source_elapsed ------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 216 Preparing statement [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 390 reading data from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 808 Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1041 READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 33 Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1036 Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 189 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1113 Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 276 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1172 Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 332 REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | 127.0.0.2 | -- Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 565 Enqueuing response to /127.0.0.2 [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 648 Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 783 Processing response from /127.0.0.1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 | -- Initiating read-repair [SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 | -- Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 | 10113 Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 | 1830 $ ccm node2 nodetool flush $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db $ ~/.ccm/repository/3.0.14/tools/bin/sstabledump /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db -k mmullass [ { "partition" : { "key" : [ "mmullass" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 36, "liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" }, "cells" : [ { "name" : "value", "value" : "0000000000000001" } ] } ] } ] {noformat} In [CASSANDRA-11409|https://issues.apache.org/jira/browse/CASSANDRA-11409], [~cam1982] commented this was not a bug. So I filed this issue as an improvement. was: {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should cause no read repair, but read repair happens with speculative retry. I think {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should stop read repair completely because the user wants to stop read repair in some cases. {panel:title=Case 1: TWCS users} The [documentation|http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=read_repair_chance] states how to disable read repair. {quote}While TWCS tries to minimize the impact of comingled data, users should attempt to avoid this behavior. Specifically, users should avoid queries that explicitly set the timestamp via CQL USING TIMESTAMP. Additionally, users should run frequent repairs (which streams data in such a way that it does not become comingled), and disable background read repair by setting the table’s read_repair_chance and dclocal_read_repair_chance to 0. {quote} {panel} {panel:title=Case 2. Strict SLA for read latency} In a peak time, read latency is a key for us but, read repair causes latency higher than no read repair. We can use anti entropy repair in off peak time for consistency. {panel} Here is my procedure to reproduce the problem. h3. 1. Create a cluster and set {{hinted_handoff_enabled}} to false. {noformat} $ ccm create -v 3.0.14 -n 3 cluster_3.0.14 $ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled: true/hinted_handoff_enabled: false/' ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done $ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:" ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done hinted_handoff_enabled: false hinted_handoff_enabled: false hinted_handoff_enabled: false $ ccm start{noformat} h3. 2. Create a keyspace and a table. {noformat} $ ccm node1 cqlsh DROP KEYSPACE IF EXISTS ks1; CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; CREATE TABLE ks1.t1 ( key text PRIMARY KEY, value blob ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'ALWAYS'; QUIT; {noformat} h3. 3. Stop node2 and node3. Insert a row. {noformat} $ ccm node3 stop && ccm node2 stop && ccm status Cluster: 'cluster_3.0.14' ---------------------- node1: UP node3: DOWN node2: DOWN $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1));" Current consistency level is ONE. Now Tracing is enabled Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501 activity | timestamp | source | source_elapsed -----------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 | 0 Parsing insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1)); [SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | 127.0.0.1 | 4323 Preparing statement [SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 | 5250 Determining replicas for mutation [SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 11886 Appending to commitlog [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12195 Adding to t1 memtable [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | 12392 Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 | 12680 $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value ----------+-------------------- mmullass | 0x0000000000000001 (1 rows) Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501 activity | timestamp | source | source_elapsed ----------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 296 Preparing statement [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 | 561 Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1056 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1142 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1206 Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1455 Request complete | 2017-09-13 00:01:06.682794 | 127.0.0.1 | 1794 {noformat} h3. 4. Start node2 and confirm node2 has no data. {noformat} $ ccm node2 start && ccm status Cluster: 'cluster_3.0.14' ------------------------- node1: UP node3: DOWN node2: UP $ ccm node2 nodetool flush $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No such file or directory {noformat} h3. 5. Select the row from node2 and read repair works. {noformat} $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value -----+------- (0 rows) Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979 activity | timestamp | source | source_elapsed -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:02:51.582000 | 127.0.0.2 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1112 Preparing statement [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1412 reading data from /127.0.0.1 [SharedPool-Worker-2] | 2017-09-13 00:02:51.584000 | 127.0.0.2 | 2107 Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3492 Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3516 Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3595 Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3673 Read 0 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3851 READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:02:51.588000 | 127.0.0.1 | 33 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12444 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12536 Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12765 Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12929 Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:02:51.602000 | 127.0.0.1 | 14686 REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | 127.0.0.2 | -- Processing response from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- Initiating read-repair [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373) (d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b) [ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 | -- Request complete | 2017-09-13 00:02:51.586892 | 127.0.0.2 | 4892 {noformat} h3. 6. As a result, node2 has the row. {noformat} $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';" Current consistency level is ONE. Now Tracing is enabled key | value ----------+-------------------- mmullass | 0x0000000000000001 (1 rows) Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979 activity | timestamp | source | source_elapsed ------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 0 Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 216 Preparing statement [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 390 reading data from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 808 Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1041 READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 33 Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1036 Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 189 Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1113 Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 276 Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | 1172 Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | 332 REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | 127.0.0.2 | -- Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 565 Enqueuing response to /127.0.0.2 [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 648 Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | 783 Processing response from /127.0.0.1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 | -- Initiating read-repair [SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 | -- Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 | 10113 Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 | 1830 $ ccm node2 nodetool flush $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db $ ~/.ccm/repository/3.0.14/tools/bin/sstabledump /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db -k mmullass [ { "partition" : { "key" : [ "mmullass" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 36, "liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" }, "cells" : [ { "name" : "value", "value" : "0000000000000001" } ] } ] } ] {noformat} In [CASSANDRA-11409|https://issues.apache.org/jira/browse/CASSANDRA-11409], [~cam1982] commented this was not a bug. So I filed this issue as a improvement. > Speculative retry causes read repair even if read_repair_chance is 0.0. > ----------------------------------------------------------------------- > > Key: CASSANDRA-13863 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13863 > Project: Cassandra > Issue Type: Improvement > Reporter: Hiro Wakabayashi > > {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should > cause no read repair, but read repair happens with speculative retry. I think > {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should > stop read repair completely because the user wants to stop read repair in > some cases. > {panel:title=Case 1: TWCS users} > The > [documentation|http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=read_repair_chance] > states how to disable read repair. > {quote}While TWCS tries to minimize the impact of comingled data, users > should attempt to avoid this behavior. Specifically, users should avoid > queries that explicitly set the timestamp via CQL USING TIMESTAMP. > Additionally, users should run frequent repairs (which streams data in such a > way that it does not become comingled), and disable background read repair by > setting the table’s read_repair_chance and dclocal_read_repair_chance to 0. > {quote} > {panel} > {panel:title=Case 2. Strict SLA for read latency} > In a peak time, read latency is a key for us but, read repair causes latency > higher than no read repair. We can use anti entropy repair in off peak time > for consistency. > {panel} > > Here is my procedure to reproduce the problem. > h3. 1. Create a cluster and set {{hinted_handoff_enabled}} to false. > {noformat} > $ ccm create -v 3.0.14 -n 3 cluster_3.0.14 > $ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled: > true/hinted_handoff_enabled: false/' > ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done > $ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:" > ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done > hinted_handoff_enabled: false > hinted_handoff_enabled: false > hinted_handoff_enabled: false > $ ccm start{noformat} > h3. 2. Create a keyspace and a table. > {noformat} > $ ccm node1 cqlsh > DROP KEYSPACE IF EXISTS ks1; > CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '3'} AND durable_writes = true; > CREATE TABLE ks1.t1 ( > key text PRIMARY KEY, > value blob > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'ALWAYS'; > QUIT; > {noformat} > h3. 3. Stop node2 and node3. Insert a row. > {noformat} > $ ccm node3 stop && ccm node2 stop && ccm status > Cluster: 'cluster_3.0.14' > ---------------------- > node1: UP > node3: DOWN > node2: DOWN > $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 > (key, value) values ('mmullass', bigintAsBlob(1));" > Current consistency level is ONE. > Now Tracing is enabled > Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501 > activity > | timestamp | source | > source_elapsed > -----------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 | > 0 > Parsing insert into ks1.t1 (key, value) values ('mmullass', > bigintAsBlob(1)); [SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | > 127.0.0.1 | 4323 > Preparing > statement [SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 | > 5250 > Determining replicas for > mutation [SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 11886 > Appending to > commitlog [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 12195 > Adding to t1 > memtable [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 12392 > > Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 | > 12680 > $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > ----------+-------------------- > mmullass | 0x0000000000000001 > (1 rows) > Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501 > activity | > timestamp | source | source_elapsed > ----------------------------------------------------------------------------+----------------------------+-----------+---------------- > Execute CQL3 query | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 0 > Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 296 > Preparing statement [SharedPool-Worker-1] | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 561 > Executing single-partition query on t1 [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1056 > Acquiring sstable references [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1142 > Merging memtable contents [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1206 > Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1455 > Request complete | > 2017-09-13 00:01:06.682794 | 127.0.0.1 | 1794 > {noformat} > h3. 4. Start node2 and confirm node2 has no data. > {noformat} > $ ccm node2 start && ccm status > Cluster: 'cluster_3.0.14' > ------------------------- > node1: UP > node3: DOWN > node2: UP > $ ccm node2 nodetool flush > $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db > ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No > such file or directory > {noformat} > h3. 5. Select the row from node2 and read repair works. > {noformat} > $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > -----+------- > (0 rows) > Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979 > activity > > > | timestamp | source | source_elapsed > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > > Execute CQL3 query > | 2017-09-13 00:02:51.582000 | 127.0.0.2 | 0 > > > Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] > | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1112 > > > Preparing statement [SharedPool-Worker-2] > | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1412 > > > reading data from /127.0.0.1 [SharedPool-Worker-2] > | 2017-09-13 00:02:51.584000 | 127.0.0.2 | 2107 > > > Executing single-partition query on t1 [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3492 > > > Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3516 > > > Acquiring sstable references [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3595 > > > Merging memtable contents [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3673 > > > Read 0 live and 0 tombstone cells [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3851 > > > READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] > | 2017-09-13 00:02:51.588000 | 127.0.0.1 | 33 > > > Acquiring sstable references [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12444 > > > Merging memtable contents [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12536 > > > Read 1 live and 0 tombstone cells [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12765 > > > Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12929 > > Sending > REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] > | 2017-09-13 00:02:51.602000 | 127.0.0.1 | 14686 > > > REQUEST_RESPONSE message received from /127.0.0.1 > [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | > 127.0.0.2 | -- > > > Processing response from /127.0.0.1 [SharedPool-Worker-3] > | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- > > > Initiating read-repair [SharedPool-Worker-3] > | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- > Digest mismatch: org.apache.cassandra.service.DigestMismatchException: > Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373) > (d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b) > [ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 | -- > > > Request complete > | 2017-09-13 00:02:51.586892 | 127.0.0.2 | 4892 > {noformat} > h3. 6. As a result, node2 has the row. > {noformat} > $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > ----------+-------------------- > mmullass | 0x0000000000000001 > (1 rows) > Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979 > activity > | timestamp | source | source_elapsed > ------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > Execute CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 0 > Parsing select * from ks1.t1 where key = 'mmullass'; > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 216 > Preparing statement > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 390 > reading data from /127.0.0.1 > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 808 > Executing single-partition query on t1 > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1041 > READ message received from /127.0.0.2 > [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | > 127.0.0.1 | 33 > Sending READ message to /127.0.0.1 > [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | > 127.0.0.2 | 1036 > Executing single-partition query on t1 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 189 > Acquiring sstable references > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1113 > Acquiring sstable references > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 276 > Merging memtable contents > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1172 > Merging memtable contents > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 332 > REQUEST_RESPONSE message received from /127.0.0.1 > [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | > 127.0.0.2 | -- > Read 1 live and 0 tombstone cells > [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | > 565 > Enqueuing response to /127.0.0.2 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | > 648 > Sending REQUEST_RESPONSE message to /127.0.0.2 > [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | > 127.0.0.1 | 783 > Processing response from /127.0.0.1 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 | > -- > Initiating read-repair > [SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 | > -- > Read 1 live and 0 tombstone cells > [SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 | > 10113 > > Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 | 1830 > $ ccm node2 nodetool flush > $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db > /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db > $ ~/.ccm/repository/3.0.14/tools/bin/sstabledump > /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db > -k mmullass > [ > { > "partition" : { > "key" : [ "mmullass" ], > "position" : 0 > }, > "rows" : [ > { > "type" : "row", > "position" : 36, > "liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" }, > "cells" : [ > { "name" : "value", "value" : "0000000000000001" } > ] > } > ] > } > ] > {noformat} > In [CASSANDRA-11409|https://issues.apache.org/jira/browse/CASSANDRA-11409], > [~cam1982] commented this was not a bug. So I filed this issue as an > improvement. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org