[
https://issues.apache.org/jira/browse/CASSANDRA-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Brown updated CASSANDRA-9667:
-----------------------------------
Fix Version/s: (was: 3.x)
> strongly consistent membership and ownership
> --------------------------------------------
>
> Key: CASSANDRA-9667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9667
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jason Brown
> Labels: LWT, membership, ownership
>
> Currently, there is advice to users to "wait two minutes between adding new
> nodes" in order for new node tokens, et al, to propagate. Further, as there's
> no coordination amongst joining node wrt token selection, new nodes can end
> up selecting ranges that overlap with other joining nodes. This causes a lot
> of duplicate streaming from the existing source nodes as they shovel out the
> bootstrap data for those new nodes.
> This ticket proposes creating a mechanism that allows strongly consistent
> membership and ownership changes in cassandra such that changes are performed
> in a linearizable and safe manner. The basic idea is to use LWT operations
> over a global system table, and leverage the linearizability of LWT for
> ensuring the safety of cluster membership/ownership state changes. This work
> is inspired by Riak's claimant module.
> The existing workflows for node join, decommission, remove, replace, and
> range move (there may be others I'm not thinking of) will need to be modified
> to participate in this scheme, as well as changes to nodetool to enable them.
> Note: we distinguish between membership and ownership in the following ways:
> for membership we mean "a host in this cluster and it's state". For
> ownership, we mean "what tokens (or ranges) does each node own"; these nodes
> must already be a member to be assigned tokens.
> A rough draft sketch of how the 'add new node' workflow might look like is:
> new nodes would no longer create tokens themselves, but instead contact a
> member of a Paxos cohort (via a seed). The cohort member will generate the
> tokens and execute a LWT transaction, ensuring a linearizable change to the
> membership/ownership state. The updated state will then be disseminated via
> the existing gossip.
> As for joining specifically, I think we could support two modes: auto-mode
> and manual-mode. Auto-mode is for adding a single new node per LWT operation,
> and would require no operator intervention (much like today). In manual-mode,
> however, multiple new nodes could (somehow) signal their their intent to join
> to the cluster, but will wait until an operator executes a nodetool command
> that will trigger the token generation and LWT operation for all pending new
> nodes. This will allow us better range partitioning and will make the
> bootstrap streaming more efficient as we won't have overlapping range
> requests.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)