[ 
https://issues.apache.org/jira/browse/CASSANDRA-13419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957793#comment-15957793
 ] 

Ariel Weisberg commented on CASSANDRA-13419:
--------------------------------------------

[~slebresne] [~pauloricardomg]

Sylvain I don't follow your suggestion.

So the starting point is that we need the quorum for the condition check or 
serial read to include at least one replica that responded to PREPARE. This 
fixes the stale read issue from CASSANDRA-8346.

So we might only consider a node pending for CAS for the timeout of a Paxos 
round. Because if it's been pending longer than that amount of time it must 
have been part of the quorum of the PREPARE? What drives that guarantee?

Could we do something very simple like remember who was in the QUORUM for 
PREPARE and require a response from at least one of them when doing the 
condition check or read?

I don't see having the pending node be in a different state as being super hard 
either. We can record a timestamp when it first joins and then compare how long 
it has been when deciding whether it is pending for the purposes of Paxos. We 
are measuring time since what though? Since the coordinator first learned about 
the pending node via Gossip?

> Relax limit on number of pending endpoints during CAS
> -----------------------------------------------------
>
>                 Key: CASSANDRA-13419
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13419
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination, CQL
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> CASSANDRA-8346 avoids stale reads during CAS when checking the condition or 
> doing serial reads by disallowing more than one pending endpoint.
> It seems like it should be possible to allow more than one pending endpoint 
> by being smarter about who we read from during the QUORUM read or about the 
> state of pending nodes that are there for host replacement.
> Sylvain suggested 
> bq. Well, I guess things are working as they do for decently good reason 
> here. That said, thinking about it, it could be that the solution from 
> CASSANDRA-8346 is a bit of a big hammer: I believe it's enough to ensure that 
> we read from at least one replica that responded to PREPARE 'in the same 
> Paxos round' But we have timeouts on the paxos round, so it could be it is 
> possible to reduce drastically the time we consider a node pending for CAS so 
> that it's not a real problem in practice. Something like having pending node 
> move to a "almost there" state before becoming true replica, and staying in 
> that state for basically the max time of a paxos round, and then Paxos might 
> be able to replace "pending" nodes by those "almost there" for PREPARE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to