+1 (nb) On Tue, 6 May 2025 at 17:32, Aleksey Yeshchenko <alek...@apple.com> wrote:
> +1 > > On 5 May 2025, at 23:24, Blake Eggleston <bl...@ultrablake.com> wrote: > > As mutation tracking relates to existing backup systems that account for > repaired vs unrepaired sstables, mutation tracking will continue to promote > sstables to repaired once we know they contain data that has been fully > reconciled. The main difference is that they won’t be promoted as part of > an explicit range repair, but by compaction as they’re able to be promoted. > > (also +1 to finishing witnesses) > > On Mon, May 5, 2025, at 11:45 AM, Benedict Elliott Smith wrote: > > Consistent backup/restore is a fundamentally hard and unsolved problem for > Cassandra today (without any of the mentioned features). In particular, we > break the real-time guarantee of the linearizability property (most notably > for LWTs) between partitions for any backup/restore process today. > > Fixing this should be relatively straight-forward for Accord, and > something we intend to address in follow-up work. Fixing it for eventually > consistent (or Paxos/LWT) operations is I think achievable, with or without > mutation tracking (probably easier with mutation tracking). I’m not sure of > any plans to try to tackle this though. > > Witness replicas should not particularly matter at all to any of the above. > > On 5 May 2025, at 18:49, Jon Haddad <j...@rustyrazorblade.com> wrote: > > It took me a bit to wrap my head around how this works, but now that I > think I understand the idea, it sounds like a solid improvement. Being > able to achieve the same results as quorum but costing 1/3 less is a *big > deal* and I know several teams that would be interested. > > One thing I'm curious about (and we can break it out into a separate > discussion), is how all the functionality that requires coordination and > global state (repaired vs non-repaired) will affect backups. Without a > synchronization primitive to take a cluster-wide snapshot, how can we > safely restore from eventually consistent backups without risking > consistency issues due to out-of-sync repaired status? > > I don't think we need to block any of the proposed work on this - it's > just something that's been nagging at me, and I don't know enough about the > nuance of Accord, Mutation Tracking or Witness Replicas to say if it > affects things or not. If it does, let's make sure we have that documented > [1] > > Jon > > [1] > https://cassandra.apache.org/doc/latest/cassandra/managing/operating/backups.html > > > > On Mon, May 5, 2025 at 10:21 AM Nate McCall <zznat...@gmail.com> wrote: > > This sounds like a modern feature that will benefit a lot of folks in > cutting storage costs, particularly in large deployments. > > I'd like to see a note on the CEP about documentation overhead as this is > an important feature to communicate correctly, but that's just a nit. +1 on > moving forward with this overall. > > On Sun, May 4, 2025 at 1:58 PM Jordan West <jw...@apache.org> wrote: > > I’m generally supportive. The concept is one that I can see the benefits > of and I also think the current implementation adds a lot of complexity to > the codebase for being stuck in experimental mode. It will be great to have > a more robust version built on a better approach. > > On Sun, May 4, 2025 at 00:27 Benedict <bened...@apache.org> wrote: > > +1 > > This is an obviously good feature for operators that are storage-bound in > multi-DC deployments but want to retain their latency characteristics > during node maintenance. Log replicas are the right approach. > > > On 3 May 2025, at 23:42, sc...@paradoxica.net wrote: > > > > Hey everybody, bumping this CEP from Ariel in case you'd like some > weekend reading. > > > > We’d like to finish witnesses and bring them out of “experimental” > status now that Transactional Metadata and Mutation Tracking provide the > building blocks needed to complete them. > > > > Witnesses are part of a family of approaches in replicated storage > systems to maintain or boost availability and durability while reducing > storage costs. Log replicas are a close relative. Both are used by leading > cloud databases – for instance, Spanner implements witness replicas [1] > while DynamoDB implements log replicas [2]. > > > > Witness replicas are a great fit for topologies that replicate at > greater than RF=3 –– most commonly multi-DC/multi-region deployments. Today > in Cassandra, all members of a voting quorum replicate all data forever. > Witness replicas let users break this coupling. They allow one to define > voting quorums that are larger than the number of copies of data that are > stored in perpetuity. > > > > Take a 3× DC cluster replicated at RF=3 in each DC as an example. In > this topology, Cassandra stores 9× copies of the database forever - huge > storage amplification. Witnesses allow users to maintain a voting quorum of > 9 members (3× per DC); but reduce the durable replicas to 2× per DC – e.g., > two durable replicas and one witness. This maintains the availability > properties of an RF=3×3 topology while reducing storage costs by 33%, going > from 9× copies to 6×. > > > > The role of a witness is to "witness" a write and persist it until it > has been reconciled among all durable replicas; and to respond to read > requests for witnessed writes awaiting reconciliation. Note that witnesses > don't introduce a dedicated role for a node – whether a node is a durable > replica or witness for a token just depends on its position in the ring. > > > > This CEP builds on CEP-45: Mutation Tracking to establish the safety > property of the witness: guaranteeing that writes have been persisted to > all durable replicas before becoming purgeable. CEP-45's journal and > reconciliation design provide a great mechanism to ensure this while > avoiding the write amplification of incremental repair and anticompaction. > > > > Take a look at the CEP if you're interested - happy to answer questions > and discuss further: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking > > > > – Scott > > > > [1] https://cloud.google.com/spanner/docs/replication > > [2] https://www.usenix.org/system/files/atc22-elhemali.pdf > > > >> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws> wrote: > >> > >> Hi all, > >> > >> The CEP is available here: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959 > >> > >> We would like to propose CEP-46: Finish Transient Replication/Witnesses > for adoption by the community. CEP-46 would rename transient replication to > witnesses and leverage mutation tracking to implement witnesses as CEP-45 > Mutation Tracking based Log Replicas as a replacement for incremental > repair based witnesses. > >> > >> For those not familiar with transient replication it would have the > keyspace replication settings declare some replicas as transient and when > incremental repair runs the transient replicas would delete data instead of > moving it into the repaired set. > >> > >> With log replicas nodes only materialize mutations in their local LSM > for ranges where they are full replicas and not witnesses. For witness > ranges a node will write mutations to their local mutation tracking log and > participate in background and read time reconciliation. This saves the > compaction overhead of IR based witnesses which have to materialize and > perform compaction on all mutations even those being applied to witness > ranges. > >> > >> This would address one of the biggest issues with witnesses which is > the lack of monotonic reads. Implementation complexity wise this would > actually delete code compared to what would be required to complete IR > based witnesses because most of the heavy lifting is already done by > mutation tracking. > >> > >> Log replicas also makes it much more practical to realize the cost > savings of witnesses because log replicas have easier to characterize > resource consumption requirements (write rate * recovery/reconfiguration > time) and target a 10x improvement in write throughput. This makes knowing > how much capacity can be omitted safer and easier. > >> > >> Thanks, > >> Ariel > > > > > > -- Dmitry Konstantinov