Wei Zhu created CASSANDRA-8819:
----------------------------------
Summary: LOCAL_QUORUM writes returns wrong message
Key: CASSANDRA-8819
URL: https://issues.apache.org/jira/browse/CASSANDRA-8819
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: CentOS 6.6
Reporter: Wei Zhu
Fix For: 2.0.8
We have two DC3, each with 7 nodes.
Here is the keyspace setup:
create keyspace test
with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {DC2 : 3, DC1 : 3}
and durable_writes = true;
We brought down two nodes in DC2 for maintenance. We only write to DC1 using
local_quroum (using datastax JavaClient)
But we see this errors in the log:
Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica
were required but only 3 acknowledged the write
why does it say 4 replica were required? and Why would it give error back to
client since local_quorum should succeed.
Here are the output from nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.2.0.1 10.92 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC206
UN 10.2.0.2 6.17 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106
UN 10.2.0.3 6.63 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
DL 10.2.0.4 1.54 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
UN 10.2.0.5 6.02 GB 256 6.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106
UJ 10.2.0.6 3.68 GB 256 ? XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
UN 10.2.0.7 7.22 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.1.0.1 6.04 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
UN 10.1.0.2 7.55 GB 256 7.4% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC8
UN 10.1.0.3 5.83 GB 256 7.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
UN 10.1.0.4 7.34 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC6
UN 10.1.0.5 7.57 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC7
UN 10.1.0.6 5.31 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
UN 10.1.0.7 5.47 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
I did a cql trace on the query and here is the trace, and it does say
Write timeout; received 3 of 4 required replies | 17:27:52,831 | 10.1.0.1 |
2002873
at the end. I guess that is where the client gets the error from.
Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890
activity
| timestamp |
source | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------------------+--------------+-------------+----------------
execute_cql3_query | 17:27:50,828 |
10.1.0.1 | 0
Parsing insert into test (user_id, created, event_data, event_id)values (
123456789 , 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 17:27:50,828
| 10.1.0.1 | 39
Preparing statement | 17:27:50,828 |
10.1.0.1 | 135
Message received from /10.1.0.1 | 17:27:50,829 |
10.1.0.5 | 25
Sending message to /10.1.0.5 | 17:27:50,829 |
10.1.0.1 | 421
Executing single-partition query on users | 17:27:50,829 |
10.1.0.5 | 177
Acquiring sstable references | 17:27:50,829 |
10.1.0.5 | 191
Merging memtable tombstones | 17:27:50,830 |
10.1.0.5 | 208
Message received from /10.1.0.5 | 17:27:50,830 |
10.1.0.1 | 1461
Message received from /10.1.0.1 | 17:27:50,830 |
10.1.0.2 | 14
Key cache hit for sstable 1 | 17:27:50,830 |
10.1.0.5 | 254
Processing response from /10.1.0.5 | 17:27:50,830 |
10.1.0.1 | 1514
Executing single-partition query on users | 17:27:50,830 |
10.1.0.2 | 78
Seeking to partition beginning in data file | 17:27:50,830 |
10.1.0.5 | 264
Sending message to /10.1.0.2 | 17:27:50,830 |
10.1.0.1 | 1517
Acquiring sstable references | 17:27:50,830 |
10.1.0.2 | 85
Skipped 0/1
non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 |
10.1.0.5 | 453
Determining replicas for mutation | 17:27:50,830 |
10.1.0.1 | 1746
Merging memtable tombstones | 17:27:50,830 |
10.1.0.2 | 97
Merging data from memtables and 1 sstables | 17:27:50,830 |
10.1.0.5 | 476
Message received from /10.1.0.2 | 17:27:50,830 |
10.1.0.1 | 2369
Key cache hit for sstable 2 | 17:27:50,830 |
10.1.0.2 | 120
Read 1 live and 0 tombstoned cells | 17:27:50,830 |
10.1.0.5 | 506
Seeking to partition beginning in data file | 17:27:50,830 |
10.1.0.2 | 123
Enqueuing response to /10.1.0.1 | 17:27:50,830 |
10.1.0.5 | 570
Skipped 0/1
non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 |
10.1.0.2 | 288
Sending message to /10.1.0.1 | 17:27:50,830 |
10.1.0.5 | 617
Merging data from memtables and 1 sstables | 17:27:50,830 |
10.1.0.2 | 297
Read 1 live and 0 tombstoned cells | 17:27:50,830 |
10.1.0.2 | 319
Enqueuing response to /10.1.0.1 | 17:27:50,830 |
10.1.0.2 | 362
Sending message to /10.1.0.1 | 17:27:50,830 |
10.1.0.2 | 420
Message received from /10.1.0.1 | 17:27:50,831 |
10.1.0.4 | 17
Sending message to /10.1.0.2 | 17:27:50,831 |
10.1.0.1 | 2435
Message received from /10.1.0.1 | 17:27:50,831 |
10.1.0.2 | 8
Acquiring switchLock read lock | 17:27:50,831 |
10.1.0.4 | 61
Processing response from /10.1.0.2 | 17:27:50,831 |
10.1.0.1 | 2488
Acquiring switchLock read lock | 17:27:50,831 |
10.1.0.2 | 44
Appending to commitlog | 17:27:50,831 |
10.1.0.4 | 78
Not hinting /10.2.0.4 which has been down 364809650ms | 17:27:50,831 |
10.1.0.1 | 2503
Appending to commitlog | 17:27:50,831 |
10.1.0.2 | 62
Adding to event memtable | 17:27:50,831 |
10.1.0.4 | 96
Sending message to /10.1.0.4 | 17:27:50,831 |
10.1.0.1 | 2557
Adding to event memtable | 17:27:50,831 |
10.1.0.2 | 80
Enqueuing response to /10.1.0.1 | 17:27:50,831 |
10.1.0.4 | 177
Acquiring switchLock read lock | 17:27:50,831 |
10.1.0.1 | 2669
Enqueuing response to /10.1.0.1 | 17:27:50,831 |
10.1.0.2 | 160
Sending message to /10.1.0.1 | 17:27:50,831 |
10.1.0.4 | 333
Appending to commitlog | 17:27:50,831 |
10.1.0.1 | 2682
Sending message to /10.1.0.1 | 17:27:50,831 |
10.1.0.2 | 266
Adding to event memtable | 17:27:50,831 |
10.1.0.1 | 2720
Sending message to /10.2.0.5 | 17:27:50,831 |
10.1.0.1 | 2758
Message received from /10.1.0.2 | 17:27:50,831 |
10.1.0.1 | 2989
Processing response from /10.1.0.2 | 17:27:50,831 |
10.1.0.1 | 3024
Message received from /10.1.0.4 | 17:27:50,832 |
10.1.0.1 | 3764
Processing response from /10.1.0.4 | 17:27:50,832 |
10.1.0.1 | 3805
Message received from /10.1.0.1 | 17:27:50,841 |
10.2.0.5 | 24
Enqueuing forwarded write to /10.2.0.7 | 17:27:50,841 |
10.2.0.5 | 255
Enqueuing forwarded write to /10.2.0.3 | 17:27:50,841 |
10.2.0.5 | 283
Enqueuing forwarded write to /10.2.0.6 | 17:27:50,841 |
10.2.0.5 | 307
Acquiring switchLock read lock | 17:27:50,841 |
10.2.0.5 | 362
Appending to commitlog | 17:27:50,841 |
10.2.0.5 | 380
Sending message to /10.2.0.7 | 17:27:50,841 |
10.2.0.5 | 382
Adding to event memtable | 17:27:50,841 |
10.2.0.5 | 411
Sending message to /10.2.0.6 | 17:27:50,841 |
10.2.0.5 | 429
Enqueuing response to /10.1.0.1 | 17:27:50,841 |
10.2.0.5 | 484
Sending message to /10.2.0.3 | 17:27:50,841 |
10.2.0.5 | 561
Sending message to /10.1.0.1 | 17:27:50,841 |
10.2.0.5 | 625
Acquiring switchLock read lock | 17:27:50,842 |
10.2.0.6 | 1031
Acquiring switchLock read lock | 17:27:50,842 |
10.2.0.7 | 178
Appending to commitlog | 17:27:50,842 |
10.2.0.6 | 1066
Appending to commitlog | 17:27:50,842 |
10.2.0.7 | 196
Adding to event memtable | 17:27:50,842 |
10.2.0.6 | 1118
Adding to event memtable | 17:27:50,842 |
10.2.0.7 | 231
Enqueuing response to /10.1.0.1 | 17:27:50,842 |
10.2.0.6 | 1181
Enqueuing response to /10.1.0.1 | 17:27:50,842 |
10.2.0.7 | 286
Sending message to /10.1.0.1 | 17:27:50,842 |
10.2.0.7 | 421
Acquiring switchLock read lock | 17:27:50,843 |
10.2.0.3 | 1216
Sending message to /10.1.0.1 | 17:27:50,843 |
10.2.0.6 | 1382
Appending to commitlog | 17:27:50,843 |
10.2.0.3 | 1239
Adding to event memtable | 17:27:50,843 |
10.2.0.3 | 1313
Enqueuing response to /10.1.0.1 | 17:27:50,843 |
10.2.0.3 | 1407
Sending message to /10.1.0.1 | 17:27:50,843 |
10.2.0.3 | 1631
Message received from /10.2.0.5 | 17:27:50,851 |
10.1.0.1 | 23333
Processing response from /10.2.0.5 | 17:27:50,851 |
10.1.0.1 | 23380
Message received from /10.2.0.7 | 17:27:50,852 |
10.1.0.1 | 23908
Processing response from /10.2.0.7 | 17:27:50,852 |
10.1.0.1 | 23953
Message received from /10.2.0.6 | 17:27:50,853 |
10.1.0.1 | 25143
Processing response from /10.2.0.6 | 17:27:50,853 |
10.1.0.1 | 25178
Message received from /10.2.0.3 | 17:27:50,854 |
10.1.0.1 | 25478
Processing response from /10.2.0.3 | 17:27:50,854 |
10.1.0.1 | 25516
Write timeout; received 3 of 4 required replies | 17:27:52,831 |
10.1.0.1 | 2002873
Request complete | 17:27:52,833 |
10.1.0.1 | 2005989
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
