Re: [DISCUSS] CEP-46 Finish Transient Replication/Witnesses

scott Sat, 03 May 2025 15:42:12 -0700

Hey everybody, bumping this CEP from Ariel in case you'd like some weekend 
reading.

We’d like to finish witnesses and bring them out of “experimental” status now 
that Transactional Metadata and Mutation Tracking provide the building blocks 
needed to complete them.

Witnesses are part of a family of approaches in replicated storage systems to 
maintain or boost availability and durability while reducing storage costs. Log 
replicas are a close relative. Both are used by leading cloud databases – for 
instance, Spanner implements witness replicas [1] while DynamoDB implements log 
replicas [2].

Witness replicas are a great fit for topologies that replicate at greater than 
RF=3 –– most commonly multi-DC/multi-region deployments. Today in Cassandra, 
all members of a voting quorum replicate all data forever. Witness replicas let 
users break this coupling. They allow one to define voting quorums that are 
larger than the number of copies of data that are stored in perpetuity.

Take a 3× DC cluster replicated at RF=3 in each DC as an example. In this 
topology, Cassandra stores 9× copies of the database forever - huge storage 
amplification. Witnesses allow users to maintain a voting quorum of 9 members 
(3× per DC); but reduce the durable replicas to 2× per DC – e.g., two durable 
replicas and one witness. This maintains the availability properties of an 
RF=3×3 topology while reducing storage costs by 33%, going from 9× copies to 6×.

The role of a witness is to "witness" a write and persist it until it has been 
reconciled among all durable replicas; and to respond to read requests for 
witnessed writes awaiting reconciliation. Note that witnesses don't introduce a 
dedicated role for a node – whether a node is a durable replica or witness for 
a token just depends on its position in the ring.

This CEP builds on CEP-45: Mutation Tracking to establish the safety property 
of the witness: guaranteeing that writes have been persisted to all durable 
replicas before becoming purgeable. CEP-45's journal and reconciliation design 
provide a great mechanism to ensure this while avoiding the write amplification 
of incremental repair and anticompaction.

Take a look at the CEP if you're interested - happy to answer questions and 
discuss further: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking

– Scott

[1] https://cloud.google.com/spanner/docs/replication
[2] https://www.usenix.org/system/files/atc22-elhemali.pdf

> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <[email protected]> wrote:
> 
> Hi all,
> 
> The CEP is available here: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959
> 
> We would like to propose CEP-46: Finish Transient Replication/Witnesses for 
> adoption by the community. CEP-46 would rename transient replication to 
> witnesses and leverage mutation tracking to implement witnesses as CEP-45 
> Mutation Tracking based Log Replicas as a replacement for incremental repair 
> based witnesses.
> 
> For those not familiar with transient replication it would have the keyspace 
> replication settings declare some replicas as transient and when incremental 
> repair runs the transient replicas would delete data instead of moving it 
> into the repaired set. 
> 
> With log replicas nodes only  materialize mutations in their local LSM for 
> ranges where they are full replicas and not witnesses. For witness ranges a 
> node will write mutations to their local mutation tracking log and 
> participate in background and read time reconciliation. This saves the 
> compaction overhead of IR based witnesses which have to materialize and 
> perform compaction on all mutations even those being applied to witness 
> ranges. 
> 
> This would address one of the biggest issues with witnesses which is the lack 
> of monotonic reads. Implementation complexity wise this would actually delete 
> code compared to what would be required to complete IR based witnesses 
> because most of the heavy lifting is already done by mutation tracking.
> 
> Log replicas also makes it much more practical to realize the cost savings of 
> witnesses because log replicas have easier to characterize resource 
> consumption requirements (write rate * recovery/reconfiguration time) and 
> target a 10x improvement in write throughput.  This makes knowing how much 
> capacity can be omitted safer and easier.
> 
> Thanks,
> Ariel

Re: [DISCUSS] CEP-46 Finish Transient Replication/Witnesses

Reply via email to