This sounds like a modern feature that will benefit a lot of folks in
cutting storage costs, particularly in large deployments.

I'd like to see a note on the CEP about documentation overhead as this is
an important feature to communicate correctly, but that's just a nit. +1 on
moving forward with this overall.

On Sun, May 4, 2025 at 1:58 PM Jordan West <jw...@apache.org> wrote:

> I’m generally supportive. The concept is one that I can see the benefits
> of and I also think the current implementation adds a lot of complexity to
> the codebase for being stuck in experimental mode. It will be great to have
> a more robust version built on a better approach.
>
> On Sun, May 4, 2025 at 00:27 Benedict <bened...@apache.org> wrote:
>
>> +1
>>
>> This is an obviously good feature for operators that are storage-bound in
>> multi-DC deployments but want to retain their latency characteristics
>> during node maintenance. Log replicas are the right approach.
>>
>> > On 3 May 2025, at 23:42, sc...@paradoxica.net wrote:
>> >
>> > Hey everybody, bumping this CEP from Ariel in case you'd like some
>> weekend reading.
>> >
>> > We’d like to finish witnesses and bring them out of “experimental”
>> status now that Transactional Metadata and Mutation Tracking provide the
>> building blocks needed to complete them.
>> >
>> > Witnesses are part of a family of approaches in replicated storage
>> systems to maintain or boost availability and durability while reducing
>> storage costs. Log replicas are a close relative. Both are used by leading
>> cloud databases – for instance, Spanner implements witness replicas [1]
>> while DynamoDB implements log replicas [2].
>> >
>> > Witness replicas are a great fit for topologies that replicate at
>> greater than RF=3 –– most commonly multi-DC/multi-region deployments. Today
>> in Cassandra, all members of a voting quorum replicate all data forever.
>> Witness replicas let users break this coupling. They allow one to define
>> voting quorums that are larger than the number of copies of data that are
>> stored in perpetuity.
>> >
>> > Take a 3× DC cluster replicated at RF=3 in each DC as an example. In
>> this topology, Cassandra stores 9× copies of the database forever - huge
>> storage amplification. Witnesses allow users to maintain a voting quorum of
>> 9 members (3× per DC); but reduce the durable replicas to 2× per DC – e.g.,
>> two durable replicas and one witness. This maintains the availability
>> properties of an RF=3×3 topology while reducing storage costs by 33%, going
>> from 9× copies to 6×.
>> >
>> > The role of a witness is to "witness" a write and persist it until it
>> has been reconciled among all durable replicas; and to respond to read
>> requests for witnessed writes awaiting reconciliation. Note that witnesses
>> don't introduce a dedicated role for a node – whether a node is a durable
>> replica or witness for a token just depends on its position in the ring.
>> >
>> > This CEP builds on CEP-45: Mutation Tracking to establish the safety
>> property of the witness: guaranteeing that writes have been persisted to
>> all durable replicas before becoming purgeable. CEP-45's journal and
>> reconciliation design provide a great mechanism to ensure this while
>> avoiding the write amplification of incremental repair and anticompaction.
>> >
>> > Take a look at the CEP if you're interested - happy to answer questions
>> and discuss further:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking
>> >
>> > – Scott
>> >
>> > [1] https://cloud.google.com/spanner/docs/replication
>> > [2] https://www.usenix.org/system/files/atc22-elhemali.pdf
>> >
>> >> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> The CEP is available here:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959
>> >>
>> >> We would like to propose CEP-46: Finish Transient
>> Replication/Witnesses for adoption by the community. CEP-46 would rename
>> transient replication to witnesses and leverage mutation tracking to
>> implement witnesses as CEP-45 Mutation Tracking based Log Replicas as a
>> replacement for incremental repair based witnesses.
>> >>
>> >> For those not familiar with transient replication it would have the
>> keyspace replication settings declare some replicas as transient and when
>> incremental repair runs the transient replicas would delete data instead of
>> moving it into the repaired set.
>> >>
>> >> With log replicas nodes only  materialize mutations in their local LSM
>> for ranges where they are full replicas and not witnesses. For witness
>> ranges a node will write mutations to their local mutation tracking log and
>> participate in background and read time reconciliation. This saves the
>> compaction overhead of IR based witnesses which have to materialize and
>> perform compaction on all mutations even those being applied to witness
>> ranges.
>> >>
>> >> This would address one of the biggest issues with witnesses which is
>> the lack of monotonic reads. Implementation complexity wise this would
>> actually delete code compared to what would be required to complete IR
>> based witnesses because most of the heavy lifting is already done by
>> mutation tracking.
>> >>
>> >> Log replicas also makes it much more practical to realize the cost
>> savings of witnesses because log replicas have easier to characterize
>> resource consumption requirements (write rate * recovery/reconfiguration
>> time) and target a 10x improvement in write throughput.  This makes knowing
>> how much capacity can be omitted safer and easier.
>> >>
>> >> Thanks,
>> >> Ariel
>> >
>>
>

Reply via email to