[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write

Jira Wed, 13 Oct 2021 04:30:07 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428161#comment-17428161
 ]


Andres de la Peña commented on CASSANDRA-16334:
-----------------------------------------------

Ok, the repeated dtest runs are failing because the MID resources config for 
Circle is using medium runners, while it should use large runners. Indeed, the 
test passes as part of the regular dtest jobs because those jobs correctly use 
large runners. I'll open a ticket for fixing this. In the meantime, I have 
manually set the right resource class and the repeated runs pass, as it was 
expected:
||branch||CI||
|3.0|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1006/workflows/4c3774df-f49a-4e0b-b1a4-9e5bfee06087]|
|3.11|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1005/workflows/cd497fac-1348-4736-8b0d-fccb4dbaacbe]|
|4.0|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1007/workflows/c54bf75a-8005-4843-a432-40487f77b435]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1007/workflows/8814397a-d3fe-41ab-bdb5-fa10ca021494]|
|trunk|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1004/workflows/0fc6f761-4eb1-4f88-a281-cccf0c79cb48]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1004/workflows/97e60123-d1af-46d6-9973-654f14f7eb21]|

> Replica failure causes timeout on multi-DC write
> ------------------------------------------------
>
>                 Key: CASSANDRA-16334
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16334
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Messaging/Internode
>            Reporter: Paulo Motta
>            Assignee: Aleksandr Sorokoumov
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws 
> a write error on a single DC keyspace with RF=3:
> {noformat}
> cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to 
> execute write] message="Operation failed - received 0 responses and 3 
> failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN 
> from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 
> 1, 'received_responses': 0, 'failures': 3}
> {noformat}
> The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each):
> {noformat}
> cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed 
> out waiting for replica nodes' responses] message="Operation timed out - 
> received only 0 responses." info={'consistency': 'LOCAL_ONE', 
> 'required_responses': 1, 'received_responses': 0}
> {noformat}
> Reproduction steps:
> {noformat}
> # Setup cluster
> ccm create -n 3:3 test
> for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> 
> ~/.ccm/test/node$i/conf/cassandra.yaml; done
> ccm start
> # Create schema
> ccm node1 cqlsh
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3};
> CREATE TABLE test.test (key int PRIMARY KEY, val blob);
> exit;
> # Insert data
> python
> from cassandra.cluster import Cluster
> cluster = Cluster()
> session = cluster.connect('test')
> blob = f = open("2mbBlob", "rb").read().hex()
> session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob 
> + "'))")
> {noformat}
> Reproduced in 3.0, 3.11, 4.0, trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write

Reply via email to