Thank you for explaining. I'll dig through the code to try to remember why we 
introduced eviction, just to make sure we aren't going to introduce a 
correctness issue in place of perf/operational issue (which I am not claiming 
is the case btw, just not fully certain yet).

Also Jaydeep sorry for dropping the ball on this: I was under impression this 
has lost importance, haven't realized it was pending all that time.

On Mon, Dec 15, 2025, at 6:41 PM, Runtian Liu wrote:
> Alex, you're absolutely right that this isn’t a correctness issue—the system 
> will eventually re-prepare the statement. The problem, however, shows up in 
> real production environments under high QPS.
> 
> When a node is serving a heavy workload, the race condition described in the 
> ticket causes repeated evictions followed by repeated re-prepare attempts. 
> Instead of a single re-prepare, we see a *storm* of re-prepare requests 
> hitting the coordinator. This quickly becomes expensive: it increases CPU 
> usage, adds latency, and in our case escalated into a cluster-wide 
> performance degradation. We actually experienced an outage triggered by this 
> behavior.
> 
> So while correctness is preserved, the operational impact is severe. 
> Preventing the unnecessary eviction avoids the re-prepare storm entirely, 
> which is why we believe this patch is important for stability in real 
> clusters.
> 
> 
> On Mon, Dec 15, 2025 at 8:00 AM Paulo Motta <[email protected]> wrote:
>> I wanted to note I recently faced the issue described in this ticket in a 
>> real cluster. I'm not familiar with this area to understand if there any 
>> negative implications of this patch.
>> 
>> So even if it's not a correctness issue per se, but fixes a practical issue 
>> faced by users without negative consequences I don't see why this should not 
>> be accepted, specially since it has been validated in production.
>> 
>> On Mon, 15 Dec 2025 at 07:28 Alex Petrov <[email protected]> wrote:
>>> __
>>> iirc I reviewed it and mentioned this is not a correctness issue since we 
>>> would simply re-prepare. I can't recall why we needed to evict, but I think 
>>> this was for correctness reasons. 
>>> 
>>> Would you mind to elaborate why simply letting it to get re-prepared is 
>>> harmful behavior? Or am I missing something and this has larger 
>>> implications?
>>> 
>>> To be clear, I am not opposed to this patch, just want to understand 
>>> implications better.
>>> 
>>> On Sun, Dec 14, 2025, at 9:03 PM, Jaydeep Chovatia wrote:
>>>> Hi
>>>> 
>>>> I had reported this bug (CASSANDRA-17401 
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17401>) in 2022 along 
>>>> with the fix (PR#3059 <https://github.com/apache/cassandra/pull/3059>) and 
>>>> a reproducible (PR#3058 <https://github.com/apache/cassandra/pull/3058>). 
>>>> I already applied this fix internally, and it has been working fine for 
>>>> many years. Now we can see one of the Cassandra users has been facing the 
>>>> exact same problem. I have told them to go with the private fix for now.
>>>> Paulo and Alex had reviewed it partially, could you (or someone) please 
>>>> complete the review so I can land to the official repo.
>>>> 
>>>> Jaydeep
>>> 

Reply via email to