[ 
https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821475#comment-17821475
 ] 

Paulo Motta commented on CASSANDRA-17401:
-----------------------------------------

Thanks for the detailed reports and repro steps. I've taken a look and this 
looks to me to be a legitimate race condition that can cause a re-prepare storm 
under large concurrency and unlucky timing.

My understanding is that [these evict 
statements|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L735]
 are not required for the correctness of the upgrade compatibility logic and 
can be safely removed. Would you have some cycles to confirm this [~ifesdjeen] ?

In addition to this, I think there's a pending issue from CASSANDRA-17248 that 
can leak prepared statements between keyspaces during mixed upgrade mode. Since 
these issues are in a related area I think it makes sense to address them 
together (in separate commits) to ensure these changes are tested together.

I think the {{PreparedStatementCollisionTest}} suite from [this 
commit|https://github.com/apache/cassandra/pull/1872/commits/758bc4a89d7ca9d0bfe27e6f41000484724261bc]
 can help improve the validation coverage of this logic. That change looks 
correct to me but may need some cleanup. We should probably keep the metric 
changes out of this to keep the scope of this patch to a minimum.

After proper review and validation I think there's value in including these 
fixes in the final 3.X releases to address these outstanding issues as users 
will still do upgrade cycles as 5.x release approaches. This will make 
resolution more laborious as we will need to provide patches for 3.x all the 
way up to trunk + CI for all branches. What do you think [~brandon.williams] 
[~stefan.miklosovic]  ?

> Race condition in QueryProcessor causes just prepared statement not to be in 
> the prepared statements cache
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Ivan Senic
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The changes in the 
> [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638]
>  method that were introduced in versions *4.0.2* and *3.11.12* can cause a 
> race condition between two threads trying to concurrently prepare the same 
> statement. This race condition can cause removing of a prepared statement 
> from the cache, after one of the threads has received the result of the 
> prepare and eventually uses MD5Digest to call 
> [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215].
> The race condition looks like this:
>  * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 executes eviction of hashes
>  * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 prepares the statement and caches it
>  * Thread1 returns the result of the prepare
>  * Thread2 executes eviction of hashes
>  * Thread1 tries to execute the prepared statement with the received 
> MD5Digest, but statement is not in the cache as it was evicted by Thread2
> I tried to reproduce this by using a Java driver, but hitting this case from 
> a client side is highly unlikely and I can not simulate the needed race 
> condition. However, we can easily reproduce this in Stargate (details 
> [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to 
> QueryProcessor.
> Reproducing this in a unit test is fairly easy. I am happy to showcase this 
> if needed.
> Note that the issue can occur only when  safeToReturnCached is resolved as 
> false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to