[ 
https://issues.apache.org/jira/browse/CASSANDRA-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051740#comment-14051740
 ] 

Benedict commented on CASSANDRA-7489:
-------------------------------------

AFAICT this new scheme suffers none of the problems mentioned in 
CASSANDRA-3620. That's not to say this is definitely foolproof, but I think it 
is worth exploring.

> Track lower bound necessary for a repair, live, without actually repairing
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7489
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7489
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>              Labels: performance, repair
>
> We will need a few things in place to get this right, but it should be 
> possible to track live what the current health of a single range is across 
> the cluster. If we force an owning node to be the coordinator for an update 
> (so if a non-smart client sends a mutation to a non-owning node, it just 
> proxies it on to an owning node to coordinate the update; this should tend to 
> minimal overhead as smart clients become the norm, and smart clients scale up 
> to cope with huge clusters), then each owner can maintain the oldest known 
> timestamp it has coordinated an update for that was not acknowledged by every 
> owning node it propagated it to. The minimum of all of these for a region is 
> the lower bound from which we need to either repair, or retain tombstones. 
> With vnode file segregation we can mark an entire vnode range as repaired up 
> to the most recently determined healthy lower bound.
> There are some subtleties with this, but it means tombstones can be cleared 
> potentially only minutes after they are generated, instead of days or weeks. 
> It also means even repairs can be even more incremental, only operating over 
> ranges and time periods we know to be potentially out of sync.
> It will most likely need RAMP transactions in place, so that atomic batch 
> mutations are not serialized on non-owning nodes. Having owning nodes 
> coordinate updates is to ensure robustness in case of a single node failure - 
> in this case all ranges owned by the node are considered to have a lower 
> bound of -Inf. Without this a single node being down would result in the 
> entire cluster being considered out of sync.
> We will still need a short grace period for clients to send timestamps, and 
> we would have to outright reject any updates that arrived with a timestamp 
> near to that window expiring. But that window could safely be just minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to