[
https://issues.apache.org/jira/browse/CASSANDRA-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-7489.
---------------------------------------
Resolution: Won't Fix
This is a very complex change with lots of caveats and corner cases, and it
really doesn't give us all that much over hourly incremental repair. (Killing
TS after an hour vs after a minute isn't that big a win when you're not
constantly performing major compactions.)
So, I'm glad we have this for the interesting ideas pile, but let's not push
that rock uphill in the near future.
> Track lower bound necessary for a repair, live, without actually repairing
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-7489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7489
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Benedict
> Labels: performance, repair
>
> We will need a few things in place to get this right, but it should be
> possible to track live what the current health of a single range is across
> the cluster. If we force an owning node to be the coordinator for an update
> (so if a non-smart client sends a mutation to a non-owning node, it just
> proxies it on to an owning node to coordinate the update; this should tend to
> minimal overhead as smart clients become the norm, and smart clients scale up
> to cope with huge clusters), then each owner can maintain the oldest known
> timestamp it has coordinated an update for that was not acknowledged by every
> owning node it propagated it to. The minimum of all of these for a region is
> the lower bound from which we need to either repair, or retain tombstones.
> With vnode file segregation we can mark an entire vnode range as repaired up
> to the most recently determined healthy lower bound.
> There are some subtleties with this, but it means tombstones can be cleared
> potentially only minutes after they are generated, instead of days or weeks.
> It also means even repairs can be even more incremental, only operating over
> ranges and time periods we know to be potentially out of sync.
> It will most likely need RAMP transactions in place, so that atomic batch
> mutations are not serialized on non-owning nodes. Having owning nodes
> coordinate updates is to ensure robustness in case of a single node failure -
> in this case all ranges owned by the node are considered to have a lower
> bound of -Inf. Without this a single node being down would result in the
> entire cluster being considered out of sync.
> We will still need a short grace period for clients to send timestamps, and
> we would have to outright reject any updates that arrived with a timestamp
> near to that window expiring. But that window could safely be just minutes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)