[jira] [Comment Edited] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

Stefan Miklosovic (Jira) Tue, 07 Jun 2022 07:07:08 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551078#comment-17551078
 ]


Stefan Miklosovic edited comment on CASSANDRA-14715 at 6/7/22 2:06 PM:
-----------------------------------------------------------------------

The proposed approach seems good to me but I had hard time to test this 
consistently. I tried the steps in the description of this ticket and sometimes 
I got the same response, sometimes I did not. Thinking about testing, I am not 
sure what test to write here. The options we have:

1) jvm dtest - this approach would be about setuping 2x3 cluster, inserting 
data, shutting down the node, removing the sstables of this node and starting 
it again, executing the query and watching its logs. I think that the step 
"removing of sstables" is not necessary because, I think, data dir of that node 
is automatically removed on the shutdown. I am not sure about the details and 
viability of this test approach yet.

2) same as 1 but it would be done in python dtests

3) Testing only RepairMergeListener and its close method. This would be very 
nice to do but what I noticed is that all inner classes in DataResolver 
(RepairMergeListener is inner class of DataResolver) are not static and they 
are all private. I can not just easilly test this class in isolation. I would 
need to rewrite it all to be static classes and so and this might have 
not-so-obvious consequences yet.

What I found interesting while I was testing this is that when I turned the 
node off, removed data, turned it on and listed the data dir of respective 
table, that SSTable was there again. How is this possible? Is not it like 
commit logs were flushed on the startup or something like that? I think we 
would need to remove commit logs too.


was (Author: smiklosovic):
The proposed approch seems good to me but I had hard time to test this 
consistently. I tried the steps in the description of this ticket and sometimes 
I got the same response, sometimes I did not. Thinking about testing, I am not 
sure what test to write here. The options we have:

1) jvm dtest - this approach would be about setuping 2x3 cluster, inserting 
data, shutting down the node, removing the sstables of this node and starting 
it again, executing the query and watching its logs. I think that the step 
"removing of sstables" is not necessary because, I think, data dir of that node 
is automatically remove on the shutdown. I am not sure about the details and 
viability of this test approach yet.

2) same as 1 but it would be done in python dtests

3) Testing only RepairMergeListener and its close method. This would be very 
nice to do but what I noticed is that all inner classes in DataResolver 
(RepairMergeListener is inner class of DataResolver) are not static and they 
are all private. I can not just easilly test this class in isolation. I would 
need to rewrite it all to be static classes and so and this might have 
not-so-obvious consequences yet.

What I found interesting while I was testing this is that when I turned the 
node off, removed data, turned it on and listed the data dir of respective 
table, that SSTable was there again. How is this possible? Is not it like 
commit logs were flushed on the startup or something like that? I think we 
would need to remove commit logs too.

> Read repairs can result in bogus timeout errors to the client
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-14715
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Low
>
> In RepairMergeListener:close() it does the following:
>  
> {code:java}
> try
> {
>     FBUtilities.waitOnFutures(repairResults, 
> DatabaseDescriptor.getWriteRpcTimeout());
> }
> catch (TimeoutException ex)
> {
>     // We got all responses, but timed out while repairing
>     int blockFor = consistency.blockFor(keyspace);
>     if (Tracing.isTracing())
>         Tracing.trace("Timed out while read-repairing after receiving all {} 
> data and digest responses", blockFor);
>     else
>         logger.debug("Timeout while read-repairing after receiving all {} 
> data and digest responses", blockFor);
>     throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
> }
> {code}
> This propagates up and gets sent to the client and we have customers get 
> confused cause they see timeouts for CL ALL requiring ALL replicas even 
> though they have read_repair_chance = 0 and using a LOCAL_* CL.
> At minimum I suggest instead of using the consistency level of DataResolver 
> (which is always ALL with read repairs) for the timeout it instead use 
> repairResults.size(). That is blockFor = repairResults.size() . But saying it 
> received _blockFor - 1_ is bogus still. Fixing that would require more 
> changes. I was thinking maybe like so:
>  
> {code:java}
> public static void waitOnFutures(List<AsyncOneResponse> results, long ms, 
> MutableInt counter) throws TimeoutException
> {
>     for (AsyncOneResponse result : results)
>     {
>         result.get(ms, TimeUnit.MILLISECONDS);
>         counter.increment();
>     }
> }
> {code}
>  
>  
>  
> Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
> _blockFor - 1_ for how many were received, which is also bogus.
>  
> Steps used to reproduce was modify RepairMergeListener:close() to always 
> throw timeout exception.  With schema:
> {noformat}
> CREATE KEYSPACE weather WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
> CREATE TABLE weather.city (
> cityid int PRIMARY KEY,
> name text
> ) WITH bloom_filter_fp_chance = 0.01
> AND dclocal_read_repair_chance = 0.0
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {noformat}
> Then using the following steps:
>  # ccm node1 cqlsh
>  # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
>  # exit;
>  # ccm node1 flush
>  # ccm node1 stop
>  # rm -rf 
> ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
>  # remove the sstable with the insert
>  # ccm node1 start
>  # ccm node1 cqlsh
>  # CONSISTENCY LOCAL_QUORUM;
>  # select * from weather.city where cityid = 1;
> You get result of:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 5 
> responses." info={'received_responses': 5, 'required_responses': 6, 
> 'consistency': 'ALL'}{noformat}
> But was expecting:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 1 
> responses." info={'received_responses': 1, 'required_responses': 2, 
> 'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

Reply via email to