> What do you expect this to look like in practice? JSON representation of the 
> ring? Would reads and writes have halted? In what situations would the 
> database be entirely unavailable? 

The format is pretty much TDB I'm afraid. A JSON representation, or one in an 
equivalent textual format, is certainly feasible though. The metadata state 
will include more than just the ring: schema, contact points/membership of the 
CMS itself, state of pending operations are all part of it. Tools analogous to 
the old sstable2json/json2sstable could be a minimal solution, but I'm sure we 
can do better.

Regular read/write operations should not be halted, even by a total failure of 
the metadata service. There should be no situations where the a previously 
stable database becomes entirely unavailable due to a CMS failure. The worst 
case is where there is some unavailability due to permanent failure of multiple 
nodes where those nodes happen to represent a majority of the CMS. In this 
scenario, the CMS would need to be recovered before the down nodes could be 
replaced, so it's possible it would extend the period of unavailabilty, though 
not necessarily by much.

> On 23 Aug 2022, at 05:42, Jeff Jirsa <jji...@gmail.com> wrote:
> “ The proposed mechanism for dealing with both of these failure types is to 
> enable a manual operator override mode. This would allow operators to inject 
> metadata changes (potentially overriding the complete metadata state) 
> directly on any and all nodes in a cluster. At the most extreme end of the 
> spectrum, this could allow an unrecoverably corrupt state to be rectified by 
> composing a custom snapshot of cluster metadata and uploading it to all nodes 
> in the cluster”
> What do you expect this to look like in practice? JSON representation of the 
> ring? Would reads and writes have halted? In what situations would the 
> database be entirely unavailable? 
>> On Aug 22, 2022, at 11:15 AM, Derek Chen-Becker <de...@chen-becker.org> 
>> wrote:
>> This looks really interesting; thanks for putting this together! Just so I'm 
>> clear on CEP nomenclature, having external management of metadata as a 
>> non-goal doesn't preclude some future use, correct? Coincidentally, I'm 
>> working on my ApacheCon talk on improving modularity in Cassandra and one of 
>> the ideas I'm discussing is pluggably (?) replacing gossip with something(s) 
>> that allow us to externalize some of the complexity of maintaining 
>> consistency. I need to digest the proposal you've made, but I don't see the 
>> two ideas being at odds on my first read. 
>> Cheers,
>> Derek
>> On Mon, Aug 22, 2022 at 6:45 AM Sam Tunnicliffe <s...@beobal.com 
>> <mailto:s...@beobal.com>> wrote:
>> Hi,
>> I'd like to open discussion about this CEP: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21:+Transactional+Cluster+Metadata>
>> Cluster metadata in Cassandra comprises a number of disparate elements 
>> including, but not limited to, distributed schema, topology and token 
>> ownership. Following the general design principles of Cassandra, the 
>> mechanisms for coordinating updates to cluster state have favoured eventual 
>> consistency, with probabilisitic delivery via gossip being a prime example. 
>> Undoubtedly, this approach has benefits, not least in terms of resilience, 
>> particularly in highly fluid distributed environments. However, this is not 
>> the reality of most Cassandra deployments, where the total number of nodes 
>> is relatively small (i.e. in the low thousands) and the rate of change tends 
>> to be low.  
>> Historically, a significant proportion of issues affecting operators and 
>> users of Cassandra have been due, at least in part, to a lack of strongly 
>> consistent cluster metadata. In response to this, we propose a design which 
>> aims to provide linearizability of metadata changes whilst ensuring that the 
>> effects of those changes are made visible to all nodes in a strongly 
>> consistent manner. At its core, it is also pluggable, enabling 
>> Cassandra-derived projects to supply their own implementations if desired.
>> In addition to the CEP document itself, we aim to publish a working 
>> prototype of the proposed design. Obviously, this does not implement the 
>> entire proposal and there are several parts which remain only partially 
>> complete. It does include the core of the system, including a good deal of 
>> test infrastructure, so may serve as both illustration of the design and a 
>> starting point for real implementation. 
>> -- 
>> +---------------------------------------------------------------+
>> | Derek Chen-Becker                                             |
>> | GPG Key available at https://keybase.io/dchenbecker 
>> <https://keybase.io/dchenbecker> and       |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
>> <https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---------------------------------------------------------------+

Reply via email to