[ 
https://issues.apache.org/jira/browse/CASSANDRA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325696#comment-14325696
 ] 

Sylvain Lebresne commented on CASSANDRA-8819:
---------------------------------------------

There is both a joining and a leaving node (in DC2), so what probably happens 
is that for some writes there is 2 "pending endpoints". During a write, if 
there is a joining node and that node is a future replica for the write (no 
matter its datacenter), we add it to the number of nodes waited for (thus 
making us temporarily wait for more node than strictly required by the 
consistency) to avoid potentially failing CL guarantees post-bootstrap, see 
CASSANDRA-833. I'll note that this is the explanation for the error message, 
and the error message is not really the problem here: cassandra *does* 
sometimes wait for more node that what the CL initially suggests and that is 
*not* a bug.

What is a bug though is that this code is broken for LOCAL_QUORUM reads because 
while we can end up waiting for more nodes than there is replica in the local 
datacenter, {{DataCenterWriteResponseHandler}} ignores acks from other DC nodes 
and so it's a guaranteed timeout in that situation (even though, in the 
example, there should be at least 1 replica up in DC2 and so we most likely did 
got the proper number of answers, we just ignored one).

We could fix {{DataCenterWriteResponseHandler}} to count _some_ responses from 
remote DCs when there is pending endpoints (as long as we still fulfill a local 
quorum of course), but the bigger question is maybe whether we should include 
pending endpoints that are not local in 
{{AbstractWriteResponseHandler.totalBlockFor}} in the first place. I'll note 
that _in theory_, counting them as we do is kind of the correct thing to do: CL 
guarantees are based on the combination of the write and read CL, and we don't 
know what CL the user will use on read. But in practice, using {{LOCAL_QUORUM}} 
means you only care about local DC guarantees and so I think it would be fine 
and a lot more practical to exclude non-local endpoints.

Thoughts ([~jbellis] in particular since you were the author of CASSANDRA-833)? 
 

> LOCAL_QUORUM writes returns wrong message
> -----------------------------------------
>
>                 Key: CASSANDRA-8819
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8819
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: CentOS 6.6
>            Reporter: Wei Zhu
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.13
>
>
> We have two DC3, each with 7 nodes.
> Here is the keyspace setup:
>  create keyspace test
>  with placement_strategy = 'NetworkTopologyStrategy'
>  and strategy_options = {DC2 : 3, DC1 : 3}
>  and durable_writes = true;
> We brought down two nodes in DC2 for maintenance. We only write to DC1 using 
> local_quroum (using datastax JavaClient)
> But we see this errors in the log:
> Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica 
> were required but only 3 acknowledged the write
> why does it say 4 replica were required? and Why would it give error back to 
> client since local_quorum should succeed.
> Here are the output from nodetool status
> Note: Ownership information does not include topology; for complete 
> information, specify a keyspace
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens  Owns   Host ID                            
>    Rack
> UN  10.2.0.1  10.92 GB   256     7.9%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC206
> UN  10.2.0.2   6.17 GB    256     8.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC106
> UN  10.2.0.3  6.63 GB    256     7.3%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC107
> DL  10.2.0.4  1.54 GB    256     7.7%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
> UN  10.2.0.5  6.02 GB    256     6.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC106
> UJ  10.2.0.6   3.68 GB    256     ?      XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC205
> UN  10.2.0.7  7.22 GB    256     7.7%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens  Owns   Host ID                            
>    Rack
> UN  10.1.0.1   6.04 GB    256     8.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
> UN  10.1.0.2   7.55 GB    256     7.4%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC8
> UN  10.1.0.3   5.83 GB    256     7.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC9
> UN  10.1.0.4    7.34 GB    256     7.9%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC6
> UN  10.1.0.5   7.57 GB    256     8.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC7
> UN  10.1.0.6   5.31 GB    256     7.3%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC10
> UN  10.1.0.7   5.47 GB    256     8.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
> I did a cql trace on the query and here is the trace, and it does say 
>    Write timeout; received 3 of 4 required replies | 17:27:52,831 |  10.1.0.1 
> |        2002873
> at the end. I guess that is where the client gets the error from. But the 
> rows was inserted to Cassandra correctly. And I traced read with local_quorum 
> and it behaves correctly and the reads don't go to DC2. The problem is only 
> with writes on local_quorum.
> {code}
> Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890
>  activity                                                                     
>                                                                | timestamp    
> | source      | source_elapsed
> ---------------------------------------------------------------------------------------------------------------------------------------------+--------------+-------------+----------------
>                                                                               
>                                             execute_cql3_query | 17:27:50,828 
> |  10.1.0.1 |              0
>  Parsing insert into test (user_id, created, event_data, event_id)values ( 
> 123456789 , 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 
> 17:27:50,828 |  10.1.0.1 |             39
>                                                                               
>                                            Preparing statement | 17:27:50,828 
> |  10.1.0.1 |            135
>                                                                               
>                              Message received from /10.1.0.1 | 17:27:50,829 | 
>  10.1.0.5 |             25
>                                                                               
>                                 Sending message to /10.1.0.5 | 17:27:50,829 | 
>  10.1.0.1 |            421
>                                                                               
>                      Executing single-partition query on users | 17:27:50,829 
> |  10.1.0.5 |            177
>                                                                               
>                                   Acquiring sstable references | 17:27:50,829 
> |  10.1.0.5 |            191
>                                                                               
>                                    Merging memtable tombstones | 17:27:50,830 
> |  10.1.0.5 |            208
>                                                                               
>                              Message received from /10.1.0.5 | 17:27:50,830 | 
>  10.1.0.1 |           1461
>                                                                               
>                              Message received from /10.1.0.1 | 17:27:50,830 | 
>  10.1.0.2 |             14
>                                                                               
>                                    Key cache hit for sstable 1 | 17:27:50,830 
> |  10.1.0.5 |            254
>                                                                               
>                           Processing response from /10.1.0.5 | 17:27:50,830 | 
>  10.1.0.1 |           1514
>                                                                               
>                      Executing single-partition query on users | 17:27:50,830 
> |  10.1.0.2 |             78
>                                                                               
>                    Seeking to partition beginning in data file | 17:27:50,830 
> |  10.1.0.5 |            264
>                                                                               
>                                 Sending message to /10.1.0.2 | 17:27:50,830 | 
>  10.1.0.1 |           1517
>                                                                               
>                                   Acquiring sstable references | 17:27:50,830 
> |  10.1.0.2 |             85
>                                                                    Skipped 
> 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 
> 17:27:50,830 |  10.1.0.5 |            453
>                                                                               
>                              Determining replicas for mutation | 17:27:50,830 
> |  10.1.0.1 |           1746
>                                                                               
>                                    Merging memtable tombstones | 17:27:50,830 
> |  10.1.0.2 |             97
>                                                                               
>                     Merging data from memtables and 1 sstables | 17:27:50,830 
> |  10.1.0.5 |            476
>                                                                               
>                              Message received from /10.1.0.2 | 17:27:50,830 | 
>  10.1.0.1 |           2369
>                                                                               
>                                    Key cache hit for sstable 2 | 17:27:50,830 
> |  10.1.0.2 |            120
>                                                                               
>                             Read 1 live and 0 tombstoned cells | 17:27:50,830 
> |  10.1.0.5 |            506
>                                                                               
>                    Seeking to partition beginning in data file | 17:27:50,830 
> |  10.1.0.2 |            123
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,830 | 
>  10.1.0.5 |            570
>                                                                    Skipped 
> 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 
> 17:27:50,830 |  10.1.0.2 |            288
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,830 | 
>  10.1.0.5 |            617
>                                                                               
>                     Merging data from memtables and 1 sstables | 17:27:50,830 
> |  10.1.0.2 |            297
>                                                                               
>                             Read 1 live and 0 tombstoned cells | 17:27:50,830 
> |  10.1.0.2 |            319
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,830 | 
>  10.1.0.2 |            362
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,830 | 
>  10.1.0.2 |            420
>                                                                               
>                              Message received from /10.1.0.1 | 17:27:50,831 | 
>   10.1.0.4 |             17
>                                                                               
>                                 Sending message to /10.1.0.2 | 17:27:50,831 | 
>  10.1.0.1 |           2435
>                                                                               
>                              Message received from /10.1.0.1 | 17:27:50,831 | 
>  10.1.0.2 |              8
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,831 
> |   10.1.0.4 |             61
>                                                                               
>                           Processing response from /10.1.0.2 | 17:27:50,831 | 
>  10.1.0.1 |           2488
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,831 
> |  10.1.0.2 |             44
>                                                                               
>                                         Appending to commitlog | 17:27:50,831 
> |   10.1.0.4 |             78
>                                                                               
>       Not hinting /10.2.0.4 which has been down 364809650ms | 17:27:50,831 |  
> 10.1.0.1 |           2503
>                                                                               
>                                         Appending to commitlog | 17:27:50,831 
> |  10.1.0.2 |             62
>                                                                               
>                                       Adding to event memtable | 17:27:50,831 
> |   10.1.0.4 |             96
>                                                                               
>                                  Sending message to /10.1.0.4 | 17:27:50,831 
> |  10.1.0.1 |           2557
>                                                                               
>                                       Adding to event memtable | 17:27:50,831 
> |  10.1.0.2 |             80
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,831 | 
>   10.1.0.4 |            177
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,831 
> |  10.1.0.1 |           2669
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,831 | 
>  10.1.0.2 |            160
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,831 | 
>   10.1.0.4 |            333
>                                                                               
>                                         Appending to commitlog | 17:27:50,831 
> |  10.1.0.1 |           2682
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,831 | 
>  10.1.0.2 |            266
>                                                                               
>                                       Adding to event memtable | 17:27:50,831 
> |  10.1.0.1 |           2720
>                                                                               
>                                Sending message to /10.2.0.5 | 17:27:50,831 |  
> 10.1.0.1 |           2758
>                                                                               
>                              Message received from /10.1.0.2 | 17:27:50,831 | 
>  10.1.0.1 |           2989
>                                                                               
>                           Processing response from /10.1.0.2 | 17:27:50,831 | 
>  10.1.0.1 |           3024
>                                                                               
>                               Message received from /10.1.0.4 | 17:27:50,832 
> |  10.1.0.1 |           3764
>                                                                               
>                            Processing response from /10.1.0.4 | 17:27:50,832 
> |  10.1.0.1 |           3805
>                                                                               
>                              Message received from /10.1.0.1 | 17:27:50,841 | 
> 10.2.0.5 |             24
>                                                                               
>                      Enqueuing forwarded write to /10.2.0.7 | 17:27:50,841 | 
> 10.2.0.5 |            255
>                                                                               
>                      Enqueuing forwarded write to /10.2.0.3 | 17:27:50,841 | 
> 10.2.0.5 |            283
>                                                                               
>                       Enqueuing forwarded write to /10.2.0.6 | 17:27:50,841 | 
> 10.2.0.5 |            307
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,841 
> | 10.2.0.5 |            362
>                                                                               
>                                         Appending to commitlog | 17:27:50,841 
> | 10.2.0.5 |            380
>                                                                               
>                                Sending message to /10.2.0.7 | 17:27:50,841 | 
> 10.2.0.5 |            382
>                                                                               
>                                       Adding to event memtable | 17:27:50,841 
> | 10.2.0.5 |            411
>                                                                               
>                                 Sending message to /10.2.0.6 | 17:27:50,841 | 
> 10.2.0.5 |            429
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,841 | 
> 10.2.0.5 |            484
>                                                                               
>                                Sending message to /10.2.0.3 | 17:27:50,841 | 
> 10.2.0.5 |            561
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,841 | 
> 10.2.0.5 |            625
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,842 
> |  10.2.0.6 |           1031
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,842 
> | 10.2.0.7 |            178
>                                                                               
>                                         Appending to commitlog | 17:27:50,842 
> |  10.2.0.6 |           1066
>                                                                               
>                                         Appending to commitlog | 17:27:50,842 
> | 10.2.0.7 |            196
>                                                                               
>                                       Adding to event memtable | 17:27:50,842 
> |  10.2.0.6 |           1118
>                                                                               
>                                       Adding to event memtable | 17:27:50,842 
> | 10.2.0.7 |            231
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,842 | 
>  10.2.0.6 |           1181
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,842 | 
> 10.2.0.7 |            286
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,842 | 
> 10.2.0.7 |            421
>                                                                               
>                                 Acquiring switchLock read lock | 17:27:50,843 
> | 10.2.0.3 |           1216
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,843 | 
>  10.2.0.6 |           1382
>                                                                               
>                                         Appending to commitlog | 17:27:50,843 
> | 10.2.0.3 |           1239
>                                                                               
>                                       Adding to event memtable | 17:27:50,843 
> | 10.2.0.3 |           1313
>                                                                               
>                              Enqueuing response to /10.1.0.1 | 17:27:50,843 | 
> 10.2.0.3 |           1407
>                                                                               
>                                 Sending message to /10.1.0.1 | 17:27:50,843 | 
> 10.2.0.3 |           1631
>                                                                               
>                             Message received from /10.2.0.5 | 17:27:50,851 |  
> 10.1.0.1 |          23333
>                                                                               
>                          Processing response from /10.2.0.5 | 17:27:50,851 |  
> 10.1.0.1 |          23380
>                                                                               
>                             Message received from /10.2.0.7 | 17:27:50,852 |  
> 10.1.0.1 |          23908
>                                                                               
>                          Processing response from /10.2.0.7 | 17:27:50,852 |  
> 10.1.0.1 |          23953
>                                                                               
>                              Message received from /10.2.0.6 | 17:27:50,853 | 
>  10.1.0.1 |          25143
>                                                                               
>                           Processing response from /10.2.0.6 | 17:27:50,853 | 
>  10.1.0.1 |          25178
>                                                                               
>                             Message received from /10.2.0.3 | 17:27:50,854 |  
> 10.1.0.1 |          25478
>                                                                               
>                          Processing response from /10.2.0.3 | 17:27:50,854 |  
> 10.1.0.1 |          25516
>                                                                               
>                Write timeout; received 3 of 4 required replies | 17:27:52,831 
> |  10.1.0.1 |        2002873
>                                                                               
>                                               Request complete | 17:27:52,833 
> |  10.1.0.1 |        2005989
> {code}
> cqlsh:XXXX> CONSISTENCY
> Current consistency level is LOCAL_QUORUM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to