[ https://issues.apache.org/jira/browse/CASSANDRA-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Murukesh Mohanan updated CASSANDRA-13863: ----------------------------------------- Attachment: 0001-Use-read_repair_chance-when-starting-repairs-due-to-.patch As a quick fix I tried using {{read_repair_chance}} in the exception handler for {{DigestMismatchException}}. After running benchmarks with YCSB (default settings - so {{read_repair_chance}} is 0), {{workloada}}) on 3.0.(8,9,12) and 3.0.12 with the patch, the results averaged across ~50 runs are so: 3.0.8: {code} [OVERALL], RunTime(ms), 5287.62 [OVERALL], Throughput(ops/sec), 189.70 [TOTAL_GCS_PS_Scavenge], Count, 1 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 14.47 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.27 [TOTAL_GCS_PS_MarkSweep], Count, 0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0 [TOTAL_GCs], Count, 1 [TOTAL_GC_TIME], Time(ms), 14.47 [TOTAL_GC_TIME_%], Time(%), 0.27 [READ], Operations, 502.55 [READ], AverageLatency(us), 2701.96 [READ], MinLatency(us), 1144.75 [READ], MaxLatency(us), 21410.62 [READ], 95thPercentileLatency(us), 4606.09 [READ], 99thPercentileLatency(us), 8593.26 [READ], Return=OK, 502.55 [CLEANUP], Operations, 1 [CLEANUP], AverageLatency(us), 2230368.60 [CLEANUP], MinLatency(us), 2229344.60 [CLEANUP], MaxLatency(us), 2231391.60 [CLEANUP], 95thPercentileLatency(us), 2231391.60 [CLEANUP], 99thPercentileLatency(us), 2231391.60 [UPDATE], Operations, 497.45 [UPDATE], AverageLatency(us), 2118.83 [UPDATE], MinLatency(us), 976.21 [UPDATE], MaxLatency(us), 21953.26 [UPDATE], 95thPercentileLatency(us), 3519.23 [UPDATE], 99thPercentileLatency(us), 7775.53 [UPDATE], Return=OK, 497.45 {code} 3.0.9: {code} [OVERALL], RunTime(ms), 5269.64 [OVERALL], Throughput(ops/sec), 190.36 [TOTAL_GCS_PS_Scavenge], Count, 1 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 14.26 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.27 [TOTAL_GCS_PS_MarkSweep], Count, 0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0 [TOTAL_GCs], Count, 1 [TOTAL_GC_TIME], Time(ms), 14.26 [TOTAL_GC_TIME_%], Time(%), 0.27 [READ], Operations, 499.26 [READ], AverageLatency(us), 2673.89 [READ], MinLatency(us), 1141.89 [READ], MaxLatency(us), 21053.04 [READ], 95thPercentileLatency(us), 4392.28 [READ], 99thPercentileLatency(us), 8742.70 [READ], Return=OK, 499.26 [CLEANUP], Operations, 1 [CLEANUP], AverageLatency(us), 2230214.04 [CLEANUP], MinLatency(us), 2229190.04 [CLEANUP], MaxLatency(us), 2231237.04 [CLEANUP], 95thPercentileLatency(us), 2231237.04 [CLEANUP], 99thPercentileLatency(us), 2231237.04 [UPDATE], Operations, 500.74 [UPDATE], AverageLatency(us), 2106.96 [UPDATE], MinLatency(us), 967.11 [UPDATE], MaxLatency(us), 21862.40 [UPDATE], 95thPercentileLatency(us), 3477.83 [UPDATE], 99thPercentileLatency(us), 7677.11 [UPDATE], Return=OK, 500.74 {code} 3.0.12: {code} [OVERALL], RunTime(ms), 5425.13 [OVERALL], Throughput(ops/sec), 184.86 [TOTAL_GCS_PS_Scavenge], Count, 1 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 17.42 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.32 [TOTAL_GCS_PS_MarkSweep], Count, 0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0 [TOTAL_GCs], Count, 1 [TOTAL_GC_TIME], Time(ms), 17.42 [TOTAL_GC_TIME_%], Time(%), 0.32 [READ], Operations, 500.49 [READ], AverageLatency(us), 2805.40 [READ], MinLatency(us), 1158.47 [READ], MaxLatency(us), 24314.62 [READ], 95thPercentileLatency(us), 4903.83 [READ], 99thPercentileLatency(us), 9662.70 [READ], Return=OK, 500.49 [CLEANUP], Operations, 1 [CLEANUP], AverageLatency(us), 2230716.38 [CLEANUP], MinLatency(us), 2229692.38 [CLEANUP], MaxLatency(us), 2231739.38 [CLEANUP], 95thPercentileLatency(us), 2231739.38 [CLEANUP], 99thPercentileLatency(us), 2231739.38 [UPDATE], Operations, 499.51 [UPDATE], AverageLatency(us), 2225.51 [UPDATE], MinLatency(us), 971.92 [UPDATE], MaxLatency(us), 23552.06 [UPDATE], 95thPercentileLatency(us), 3822.02 [UPDATE], 99thPercentileLatency(us), 9153.19 [UPDATE], Return=OK, 499.51 {code} 3.0.12 with patch: {code} [OVERALL], RunTime(ms), 5128.40 [OVERALL], Throughput(ops/sec), 195.93 [TOTAL_GCS_PS_Scavenge], Count, 1 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 12.13 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.24 [TOTAL_GCS_PS_MarkSweep], Count, 0 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0 [TOTAL_GCs], Count, 1 [TOTAL_GC_TIME], Time(ms), 12.13 [TOTAL_GC_TIME_%], Time(%), 0.24 [READ], Operations, 500.79 [READ], AverageLatency(us), 2557.40 [READ], MinLatency(us), 1081.06 [READ], MaxLatency(us), 21607.91 [READ], 95thPercentileLatency(us), 4195.49 [READ], 99thPercentileLatency(us), 7990.74 [READ], Return=OK, 500.79 [CLEANUP], Operations, 1 [CLEANUP], AverageLatency(us), 2229325.28 [CLEANUP], MinLatency(us), 2228301.28 [CLEANUP], MaxLatency(us), 2230348.28 [CLEANUP], 95thPercentileLatency(us), 2230348.28 [CLEANUP], 99thPercentileLatency(us), 2230348.28 [UPDATE], Operations, 499.21 [UPDATE], AverageLatency(us), 1995.85 [UPDATE], MinLatency(us), 920.19 [UPDATE], MaxLatency(us), 19581.04 [UPDATE], 95thPercentileLatency(us), 3308.17 [UPDATE], 99thPercentileLatency(us), 6704.85 [UPDATE], Return=OK, 499.21 {code} The benchmarks were run on ccm clusters on an MBP, so they probably can't be compared with the stats provided by Shogo. At any rate, there's a marked increase in the 99th percentile latency with 3.0.12, and it goes back to normal by disabling read repair. > Speculative retry causes read repair even if read_repair_chance is 0.0. > ----------------------------------------------------------------------- > > Key: CASSANDRA-13863 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13863 > Project: Cassandra > Issue Type: Improvement > Reporter: Hiro Wakabayashi > Attachments: > 0001-Use-read_repair_chance-when-starting-repairs-due-to-.patch, speculative > retries.pdf > > > {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should > cause no read repair, but read repair happens with speculative retry. I think > {{read_repair_chance = 0.0}} and {{dclocal_read_repair_chance = 0.0}} should > stop read repair completely because the user wants to stop read repair in > some cases. > {panel:title=Case 1: TWCS users} > The > [documentation|http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=read_repair_chance] > states how to disable read repair. > {quote}While TWCS tries to minimize the impact of comingled data, users > should attempt to avoid this behavior. Specifically, users should avoid > queries that explicitly set the timestamp via CQL USING TIMESTAMP. > Additionally, users should run frequent repairs (which streams data in such a > way that it does not become comingled), and disable background read repair by > setting the table’s read_repair_chance and dclocal_read_repair_chance to 0. > {quote} > {panel} > {panel:title=Case 2. Strict SLA for read latency} > In a peak time, read latency is a key for us but, read repair causes latency > higher than no read repair. We can use anti entropy repair in off peak time > for consistency. > {panel} > > Here is my procedure to reproduce the problem. > h3. 1. Create a cluster and set {{hinted_handoff_enabled}} to false. > {noformat} > $ ccm create -v 3.0.14 -n 3 cluster_3.0.14 > $ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled: > true/hinted_handoff_enabled: false/' > ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done > $ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:" > ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done > hinted_handoff_enabled: false > hinted_handoff_enabled: false > hinted_handoff_enabled: false > $ ccm start{noformat} > h3. 2. Create a keyspace and a table. > {noformat} > $ ccm node1 cqlsh > DROP KEYSPACE IF EXISTS ks1; > CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '3'} AND durable_writes = true; > CREATE TABLE ks1.t1 ( > key text PRIMARY KEY, > value blob > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'ALWAYS'; > QUIT; > {noformat} > h3. 3. Stop node2 and node3. Insert a row. > {noformat} > $ ccm node3 stop && ccm node2 stop && ccm status > Cluster: 'cluster_3.0.14' > ---------------------- > node1: UP > node3: DOWN > node2: DOWN > $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 > (key, value) values ('mmullass', bigintAsBlob(1));" > Current consistency level is ONE. > Now Tracing is enabled > Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501 > activity > | timestamp | source | > source_elapsed > -----------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 | > 0 > Parsing insert into ks1.t1 (key, value) values ('mmullass', > bigintAsBlob(1)); [SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | > 127.0.0.1 | 4323 > Preparing > statement [SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 | > 5250 > Determining replicas for > mutation [SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 11886 > Appending to > commitlog [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 12195 > Adding to t1 > memtable [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 | > 12392 > > Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 | > 12680 > $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > ----------+-------------------- > mmullass | 0x0000000000000001 > (1 rows) > Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501 > activity | > timestamp | source | source_elapsed > ----------------------------------------------------------------------------+----------------------------+-----------+---------------- > Execute CQL3 query | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 0 > Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 296 > Preparing statement [SharedPool-Worker-1] | > 2017-09-13 00:01:06.681000 | 127.0.0.1 | 561 > Executing single-partition query on t1 [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1056 > Acquiring sstable references [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1142 > Merging memtable contents [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1206 > Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | > 2017-09-13 00:01:06.682000 | 127.0.0.1 | 1455 > Request complete | > 2017-09-13 00:01:06.682794 | 127.0.0.1 | 1794 > {noformat} > h3. 4. Start node2 and confirm node2 has no data. > {noformat} > $ ccm node2 start && ccm status > Cluster: 'cluster_3.0.14' > ------------------------- > node1: UP > node3: DOWN > node2: UP > $ ccm node2 nodetool flush > $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db > ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No > such file or directory > {noformat} > h3. 5. Select the row from node2 and read repair works. > {noformat} > $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > -----+------- > (0 rows) > Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979 > activity > > > | timestamp | source | source_elapsed > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > > Execute CQL3 query > | 2017-09-13 00:02:51.582000 | 127.0.0.2 | 0 > > > Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] > | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1112 > > > Preparing statement [SharedPool-Worker-2] > | 2017-09-13 00:02:51.583000 | 127.0.0.2 | 1412 > > > reading data from /127.0.0.1 [SharedPool-Worker-2] > | 2017-09-13 00:02:51.584000 | 127.0.0.2 | 2107 > > > Executing single-partition query on t1 [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3492 > > > Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3516 > > > Acquiring sstable references [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585000 | 127.0.0.2 | 3595 > > > Merging memtable contents [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3673 > > > Read 0 live and 0 tombstone cells [SharedPool-Worker-1] > | 2017-09-13 00:02:51.585001 | 127.0.0.2 | 3851 > > > READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] > | 2017-09-13 00:02:51.588000 | 127.0.0.1 | 33 > > > Acquiring sstable references [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12444 > > > Merging memtable contents [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12536 > > > Read 1 live and 0 tombstone cells [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12765 > > > Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] > | 2017-09-13 00:02:51.600000 | 127.0.0.1 | 12929 > > Sending > REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] > | 2017-09-13 00:02:51.602000 | 127.0.0.1 | 14686 > > > REQUEST_RESPONSE message received from /127.0.0.1 > [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | > 127.0.0.2 | -- > > > Processing response from /127.0.0.1 [SharedPool-Worker-3] > | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- > > > Initiating read-repair [SharedPool-Worker-3] > | 2017-09-13 00:02:51.610000 | 127.0.0.2 | -- > Digest mismatch: org.apache.cassandra.service.DigestMismatchException: > Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373) > (d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b) > [ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 | -- > > > Request complete > | 2017-09-13 00:02:51.586892 | 127.0.0.2 | 4892 > {noformat} > h3. 6. As a result, node2 has the row. > {noformat} > $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 > where key = 'mmullass';" > Current consistency level is ONE. > Now Tracing is enabled > key | value > ----------+-------------------- > mmullass | 0x0000000000000001 > (1 rows) > Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979 > activity > | timestamp | source | source_elapsed > ------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- > > Execute CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 | 0 > Parsing select * from ks1.t1 where key = 'mmullass'; > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 216 > Preparing statement > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 390 > reading data from /127.0.0.1 > [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 | > 808 > Executing single-partition query on t1 > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1041 > READ message received from /127.0.0.2 > [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | > 127.0.0.1 | 33 > Sending READ message to /127.0.0.1 > [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | > 127.0.0.2 | 1036 > Executing single-partition query on t1 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 189 > Acquiring sstable references > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1113 > Acquiring sstable references > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 276 > Merging memtable contents > [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 | > 1172 > Merging memtable contents > [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 | > 332 > REQUEST_RESPONSE message received from /127.0.0.1 > [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | > 127.0.0.2 | -- > Read 1 live and 0 tombstone cells > [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | > 565 > Enqueuing response to /127.0.0.2 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 | > 648 > Sending REQUEST_RESPONSE message to /127.0.0.2 > [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | > 127.0.0.1 | 783 > Processing response from /127.0.0.1 > [SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 | > -- > Initiating read-repair > [SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 | > -- > Read 1 live and 0 tombstone cells > [SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 | > 10113 > > Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 | 1830 > $ ccm node2 nodetool flush > $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db > /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db > $ ~/.ccm/repository/3.0.14/tools/bin/sstabledump > /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db > -k mmullass > [ > { > "partition" : { > "key" : [ "mmullass" ], > "position" : 0 > }, > "rows" : [ > { > "type" : "row", > "position" : 36, > "liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" }, > "cells" : [ > { "name" : "value", "value" : "0000000000000001" } > ] > } > ] > } > ] > {noformat} > In [CASSANDRA-11409|https://issues.apache.org/jira/browse/CASSANDRA-11409], > [~cam1982] commented this was not a bug. So I filed this issue as an > improvement. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org