Setting aside the paxos vs accord conversation (though admittedly my first question would have been “why not accord”), I’m curious from folks who have thought about this how you’re thinking about correctness of repair

I ask because I have seen far more data resurrection cases than I have lost write cases, so repair here propagates that resurrection? Is that the expected primary behavior? I know repair also propagates resurrection in many cases (once tombstones purge), but has anyone running MVs in real life seen mismatches caused by lost writes instead of by something else (like resurrection)?


On May 8, 2025, at 5:44 PM, Runtian Liu <curly...@gmail.com> wrote:



Here’s my perspective:

#1 Accord vs. LWT round trips

Based on the insights shared by the Accord experts, it appears that implementing MV using Accord can achieve a comparable number of round trips as the LWT solution proposed in CEP-48. Additionally, it seems that the number of WAN RTTs might be fewer than the LWT solution through Accord. This suggests that Accord is either equivalent or better in terms of performance for CEP-48.

Given this, it seems appropriate to set aside performance as a deciding factor when evaluating LWT versus Accord. I've also updated the CEP-48 page to reflect this clarification.

#2 Accord vs. LWT current state

Accord 

Accord is poised to significantly reshape Apache Cassandra's future and stands out as one of the most impactful developments on the horizon. The community is genuinely excited about its potential.

That said, the recent mailing list update on Accord (CEP-15) highlights that substantial work remains to mature the protocol entirely. In addition, real-world testing is still needed to validate its readiness. Beyond that, users will require additional time to evaluate and adopt Cassandra 6.x in their environments.

LWT

On the other hand, LWT has been proven and has been hitting production at scale for many years.

#3 Dev work for CEP-48

The CEP-48 design has two major components.

  1. Online path (CQL Mutations)

This section focuses on the LWT code path where any mutation to a base table (via CQL insert, update, or delete) reliably triggers the corresponding materialized view (MV) update. The development effort required for this part is relatively limited, accounting for approximately 30% of the total work.

If we need to implement this on Accord, this would be a similar effort as the LWT.

  1. Offline path (MV Data Repair)

The MV repair tool in Cassandra is intended to address inconsistencies that may occur in materialized views due to various factors. This component is the most complex and demanding part of the development effort, representing roughly 70% of the overall work.

#4 Accord is mentioned as a Future Alternative in CEP-48

Accord has always been top of mind, and we genuinely appreciate the thought and effort that has gone into its design and implementation -  We’re excited about the changes, and if you look at the CEP-48 proposal, Accord is listed as a 'Future Alternative' — not as a 'Rejected Alternative' — to make clear that we continue to see value in its approach and are not opposed to it.


Based on #1, #2, #3, and #4, here is my thinking:

Scenario#1: CEP-15 prod takes longer than CEP-48 merge

Since we're starting with LWT, there is no dependency on the progress of CEP-15. This means the community can benefit from CEP-48 independently of CEP-15's timeline. Additionally, it's possible to backport the changes from trunk to the current broadly adopted Cassandra release (4.1.x), enabling adoption before upgrading to 6.x.

Scenario#2: CEP-15 prod qualified before CEP-48 merge

As noted in #3, developing on top of Accord is a relatively small effort of the overall CEP-48 scope. Therefore, we can implement using Accord before merging CEP-48 into trunk, allowing us to forgo the LWT-based approach.

Given that the work required to support Accord is relatively limited and that it would eliminate a dependency on a feature that is still maturing, proceeding with LWT is the most reliable path forward. Please feel free to share your thoughts.



On Thu, May 8, 2025 at 9:00 AM Jon Haddad <j...@rustyrazorblade.com> wrote:
Based on David and Blake’s responses, it sounds like we don’t need to block on anything. 

I realize you may be making a broader point, but in this instance it sounds like there’s nothing here preventing an accord based MV implementation. Now that i understand more about how it would be done, it also sounds a lot simpler. 




On Thu, May 8, 2025 at 8:50 AM Josh McKenzie <jmcken...@apache.org> wrote:

IMHO, focus should be on accord-based MVs.  Even if that means it's blocked on first adding support for multiple conditions.

Strongly disagree here. We should develop features to be as loosely coupled w/one another as possible w/an eye towards future compatibility and leverage but not block development of one functionality on something else unless absolutely required for the feature to work (I'm defining "work" here as "hits user requirements with affordances consistent w/the rest of our ecosystem").

With the logic of deferring to another feature, it would have been quite reasonable for someone to make this same statement back in fall of '23 when we were discussing delaying 5.0 for Accord's merge. But things come up, the space we're in is complex, and cutting edge distributed things are Hard.


On Thu, May 8, 2025, at 11:13 AM, Mick Semb Wever wrote:

 
Curious what others think though.  I'm +1 on the spirit of getting MVs to a stable point, but not convinced this is the best approach.




IMHO, focus should be on accord-based MVs.  Even if that means it's blocked on first adding support for multiple conditions.


Reply via email to