[
https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808931#comment-17808931
]
Jaydeepkumar Chovatia commented on CASSANDRA-17401:
---------------------------------------------------
We have been facing the exact same problem in our production environment.
As part of CASSANDRA-17248 (in C* 3.0.26), the following code was introduced in
[QueryProcessor.java|https://github.com/apache/cassandra/commit/242f7f9b18db77bce36c9bba00b2acda4ff3209e#r137491766]
{code:java}
// Make sure the missing one is going to be eventually re-prepared
evictPrepared(hashWithKeyspace);
evictPrepared(hashWithoutKeyspace); {code}
This code could very well create a race condition between two calls of
[_QueryProcessor::prepare_|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575]
call in which one thread is adding and another thread is silently removing
from the cache. Imagine there are thousands of threads calling the API, and
then it might be possible for one thread to update the cache and another thread
to remove it.
If we look at the code of [Cassandra
3.0.25|https://github.com/apache/cassandra/blob/cassandra-3.0.25/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L391],
then such eviction was not present. Hence, this seems like a regression since
the 3.0.26 version of the Cassandra.
To fix this issue, we should not *evict* the cache entries, i.e., the
above-mentioned code path introduced since C* 3.0.26 is no longer required.
[~ifesdjeen] , Could you please take a look at it?
> Race condition in QueryProcessor causes just prepared statement not to be in
> the prepared statements cache
> ----------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17401
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ivan Senic
> Priority: Normal
>
> The changes in the
> [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638]
> method that were introduced in versions *4.0.2* and *3.11.12* can cause a
> race condition between two threads trying to concurrently prepare the same
> statement. This race condition can cause removing of a prepared statement
> from the cache, after one of the threads has received the result of the
> prepare and eventually uses MD5Digest to call
> [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215].
> The race condition looks like this:
> * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false
> * Thread1 executes eviction of hashes
> * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false
> * Thread1 prepares the statement and caches it
> * Thread1 returns the result of the prepare
> * Thread2 executes eviction of hashes
> * Thread1 tries to execute the prepared statement with the received
> MD5Digest, but statement is not in the cache as it was evicted by Thread2
> I tried to reproduce this by using a Java driver, but hitting this case from
> a client side is highly unlikely and I can not simulate the needed race
> condition. However, we can easily reproduce this in Stargate (details
> [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to
> QueryProcessor.
> Reproducing this in a unit test is fairly easy. I am happy to showcase this
> if needed.
> Note that the issue can occur only when safeToReturnCached is resolved as
> false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]