This sounds like a modern feature that will benefit a lot of folks in cutting storage costs, particularly in large deployments.
I'd like to see a note on the CEP about documentation overhead as this is an important feature to communicate correctly, but that's just a nit. +1 on moving forward with this overall. On Sun, May 4, 2025 at 1:58 PM Jordan West <jw...@apache.org> wrote: > I’m generally supportive. The concept is one that I can see the benefits > of and I also think the current implementation adds a lot of complexity to > the codebase for being stuck in experimental mode. It will be great to have > a more robust version built on a better approach. > > On Sun, May 4, 2025 at 00:27 Benedict <bened...@apache.org> wrote: > >> +1 >> >> This is an obviously good feature for operators that are storage-bound in >> multi-DC deployments but want to retain their latency characteristics >> during node maintenance. Log replicas are the right approach. >> >> > On 3 May 2025, at 23:42, sc...@paradoxica.net wrote: >> > >> > Hey everybody, bumping this CEP from Ariel in case you'd like some >> weekend reading. >> > >> > We’d like to finish witnesses and bring them out of “experimental” >> status now that Transactional Metadata and Mutation Tracking provide the >> building blocks needed to complete them. >> > >> > Witnesses are part of a family of approaches in replicated storage >> systems to maintain or boost availability and durability while reducing >> storage costs. Log replicas are a close relative. Both are used by leading >> cloud databases – for instance, Spanner implements witness replicas [1] >> while DynamoDB implements log replicas [2]. >> > >> > Witness replicas are a great fit for topologies that replicate at >> greater than RF=3 –– most commonly multi-DC/multi-region deployments. Today >> in Cassandra, all members of a voting quorum replicate all data forever. >> Witness replicas let users break this coupling. They allow one to define >> voting quorums that are larger than the number of copies of data that are >> stored in perpetuity. >> > >> > Take a 3× DC cluster replicated at RF=3 in each DC as an example. In >> this topology, Cassandra stores 9× copies of the database forever - huge >> storage amplification. Witnesses allow users to maintain a voting quorum of >> 9 members (3× per DC); but reduce the durable replicas to 2× per DC – e.g., >> two durable replicas and one witness. This maintains the availability >> properties of an RF=3×3 topology while reducing storage costs by 33%, going >> from 9× copies to 6×. >> > >> > The role of a witness is to "witness" a write and persist it until it >> has been reconciled among all durable replicas; and to respond to read >> requests for witnessed writes awaiting reconciliation. Note that witnesses >> don't introduce a dedicated role for a node – whether a node is a durable >> replica or witness for a token just depends on its position in the ring. >> > >> > This CEP builds on CEP-45: Mutation Tracking to establish the safety >> property of the witness: guaranteeing that writes have been persisted to >> all durable replicas before becoming purgeable. CEP-45's journal and >> reconciliation design provide a great mechanism to ensure this while >> avoiding the write amplification of incremental repair and anticompaction. >> > >> > Take a look at the CEP if you're interested - happy to answer questions >> and discuss further: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking >> > >> > – Scott >> > >> > [1] https://cloud.google.com/spanner/docs/replication >> > [2] https://www.usenix.org/system/files/atc22-elhemali.pdf >> > >> >> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws> wrote: >> >> >> >> Hi all, >> >> >> >> The CEP is available here: >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959 >> >> >> >> We would like to propose CEP-46: Finish Transient >> Replication/Witnesses for adoption by the community. CEP-46 would rename >> transient replication to witnesses and leverage mutation tracking to >> implement witnesses as CEP-45 Mutation Tracking based Log Replicas as a >> replacement for incremental repair based witnesses. >> >> >> >> For those not familiar with transient replication it would have the >> keyspace replication settings declare some replicas as transient and when >> incremental repair runs the transient replicas would delete data instead of >> moving it into the repaired set. >> >> >> >> With log replicas nodes only materialize mutations in their local LSM >> for ranges where they are full replicas and not witnesses. For witness >> ranges a node will write mutations to their local mutation tracking log and >> participate in background and read time reconciliation. This saves the >> compaction overhead of IR based witnesses which have to materialize and >> perform compaction on all mutations even those being applied to witness >> ranges. >> >> >> >> This would address one of the biggest issues with witnesses which is >> the lack of monotonic reads. Implementation complexity wise this would >> actually delete code compared to what would be required to complete IR >> based witnesses because most of the heavy lifting is already done by >> mutation tracking. >> >> >> >> Log replicas also makes it much more practical to realize the cost >> savings of witnesses because log replicas have easier to characterize >> resource consumption requirements (write rate * recovery/reconfiguration >> time) and target a 10x improvement in write throughput. This makes knowing >> how much capacity can be omitted safer and easier. >> >> >> >> Thanks, >> >> Ariel >> > >> >