Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Jeff Jirsa Wed, 12 Apr 2023 06:30:41 -0700

Are you always inserting into the same partition (with contention) or different 
?


Which version are you using ? 

The short tldr is that the failure modes of the existing paxos implementation 
(under contention, under latency, under cluster strain) can cause undefined 
states. I believe that a subsequent serial read will deterministically resolve 
the state (look at cassandra-12126), but that has a cost (both the extra 
operation and the code complexity)

The upcoming transactional rewrite will likely change this, but it’s still WIP 
(CEP-15)




> On Apr 12, 2023, at 6:11 AM, Ralph Boehme <s...@samba.org> wrote:
> 
> On 4/11/23 21:14, Ralph Boehme wrote:
>>> On 4/11/23 19:53, Bowen Song via user wrote:
>>> That error message sounds like one of the nodes timed out in the paxos 
>>> propose stage.  You can check the system.log and gc.log and see if you can 
>>> find anything unusual in them, such as network errors, out of sync clocks 
>>> or long stop-the-world GC pauses.
>> hm, I'll check the logs, but I can reproduce this 100% on an idle test 
>> cluster just by running a simple test client that generates a smallish 
>> workload where just 2 processes on a single host hammer the Cassandra 
>> cluster with LWTs.
> 
> nothing in the logs really.
> 
>> Maybe LWTs are not meant to be used this way?
> 
> fwiw, this happens 100% within a few seconds with a worload where two clients 
> hammer with LWTs on a single row.
> 
> Thanks!
> -slow
>

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

Reply via email to