Final update: David has completed a second rebase after we reached parity with trunk on our CI, and has confirmed tests remain stable. So I expect CEP-15 to merge to trunk sometime today.
No doubt there will be some unexpected disruption to others after a patch like this lands. Reach out via slack if you have any trouble. > On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org> wrote: > > Hi everyone, > > To update you: the last patches we considered blockers have landed in the > cep-15-accord branch. Caleb has now started rebasing the branch onto trunk. I > expect there will be a few failing tests still to resolve at that point, but > once they have been squashed we will proceed with the merge. > > There remains more work to do before release, and I will publish a detailed > roadmap to Jira when I’m back in a couple of weeks. > > >> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com> wrote: >> >> It sounds like we are all pretty interested in seeing this feature land and >> the branch maintenance is causing overhead that could be spent on >> finalisation. +1 on merging, particularly given the feature flag work. >> >> Once more unto the breach 💪 >> >> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org >> <mailto:bened...@apache.org>> wrote: >>> There are essentially three possible timelines to choose from here: >>> >>> 1) We agree in the next few days to merge to trunk. We will then prioritise >>> rebasing onto trunk and resolving any pre-merge items starting next week. >>> 2) There’s some more debate and agreement to merge to trunk in a week or >>> two. In the meantime we will shift to internal-first development but we’ll >>> likely prioritise the above work as soon as we can, which may be in a few >>> weeks, so we can shift to trunk first development. >>> 3) We don’t agree to merge accord anytime soon, so we shift to >>> internal-first development for the time being. I’m not sure when we will >>> prioritise any of the above. >>> >>> Our resources are finite and we’ve exhausted them (literally), so it’s >>> pretty much pick one of the above. I don’t really mind which you pick, but >>> I won’t personally be prioritising merge after this third attempt. >>> >>>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com >>>> <mailto:j...@rustyrazorblade.com>> wrote: >>>> >>>> >>> >>>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like >>>> it's several hundred commits behind trunk. Since you'll need to rebase >>>> again before merge *anyways*, would it make sense to do it once more, and >>>> I can publish easy-cass-lab with the latest branch? If folks have >>>> concerns, it's easy to fire up a cluster (I do it constantly) and try it >>>> out. >>>> >>>> I think if we were to do this, out of consideration we should time box the >>>> amount of time for an evaluation and unless someone raises an objection, >>>> consider lazy consensus achieved. >>>> >>>> Jon >>>> >>>> >>>> >>>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith >>>> <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>> Because we want to validate against the latest code in trunk, else we are >>>>> validating stale behaviours. The cost of rebasing is high, so we do not >>>>> do it frequently. That means we will likely stop developing OSS-first, as >>>>> the focus will have to move to our internal branch that satisfies these >>>>> criteria. >>>>> >>>>> Exactly what this might be for upstreaming I cannot say. Personally, I >>>>> aim to work exclusively on the branch we are stabilising. If that is not >>>>> trunk, the latency for my contributions being made public might be high, >>>>> as I have a huge imbalance of over-investment to recoup, and anything >>>>> unnecessary will be deferred. >>>>> >>>>> Since the feature is disabled, and the code is almost entirely isolated, >>>>> I cannot imagine the cost to the community to removing this work would be >>>>> very high. But, I do not intend to argue Accord’s case here. I will let >>>>> you all decide. >>>>> >>>>> Please decide soon though, as it shapes our work planning. The positive >>>>> reception so far had lead me to consider prioritising a move to >>>>> trunk-first development within the next week or two, and the associated >>>>> work that entails. However, if that was optimistic we will have to shift >>>>> our plans. >>>>> >>>>> >>>>> >>>>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org >>>>>> <mailto:jw...@apache.org>> wrote: >>>>>> >>>>>> The work and effort in accord has been amazing. And I’m sure it sets a >>>>>> new standard for code quality and correctness testing which I’m also >>>>>> entirely behind. I also trust the folks working on it want to take it to >>>>>> the a fully production ready solution. But I’m worried about >>>>>> circumstances out of our control leaving us with a very complex feature >>>>>> that isn’t complete. >>>>>> >>>>>> I do have some questions. Could folks help me better understand why >>>>>> testing real workloads necessitates a merge (my understanding from the >>>>>> original reason is this is the impetus for why we would merge now)? Also >>>>>> I think the performance and scheme change caveats are rather large ones. >>>>>> One of accords promise was better performance and I think making schema >>>>>> changes with nodes down not being supported is a big gap. Could we have >>>>>> some criteria like “supports all the operations PaxosV2 supports” or >>>>>> “performs as well or better than PaxosV2 on [workload(s)]”? >>>>>> >>>>>> I understand waiting asks a lot of the authors in terms of baring the >>>>>> burden of a more complex merge. But I think we also need to consider >>>>>> what merging is asking the community to bear if the worst happens and we >>>>>> are unable to take the feature from its current state to something that >>>>>> can be widely used in production. >>>>>> >>>>>> >>>>>> Jordan >>>>>> >>>>>> >>>>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com >>>>>> <mailto:bl...@ultrablake.com>> wrote: >>>>>>> +1 to merging it >>>>>>> >>>>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote: >>>>>>>> You have my +1 >>>>>>>> >>>>>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org >>>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>>> > >>>>>>>> > Correct, these caveats should only apply to tables that have >>>>>>>> > opted-in to accord. >>>>>>>> > >>>>>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org >>>>>>>> > <mailto:jerem...@apache.org>> wrote: >>>>>>>> > >>>>>>>> > >>>>>>>> > So great to see all this hard work about to pay off! >>>>>>>> > >>>>>>>> > On the questions/concerns front, the only concern I would have >>>>>>>> > towards merging this to trunk is if any of the caveats apply when >>>>>>>> > someone is not using Accord. Assuming they only apply when the >>>>>>>> > feature flag is enabled, I see no reason not to get this merged into >>>>>>>> > trunk once everyone involved is happy with the state of it. >>>>>>>> > >>>>>>>> > -Jeremiah >>>>>>>> > >>>>>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith >>>>>>>> > <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>>>>> >> >>>>>>>> >> That depends on all of you lovely people :D >>>>>>>> >> >>>>>>>> >> I think we should have finished merging everything we want before >>>>>>>> >> QA by ~Monday; certainly not much later. >>>>>>>> >> >>>>>>>> >> I think we have some upgrade and python dtest failures to address >>>>>>>> >> as well. >>>>>>>> >> >>>>>>>> >> So it could be pretty soon if the community is supportive. >>>>>>>> >> >>>>>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com >>>>>>>> >> <mailto:pmcfa...@gmail.com>> wrote: >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> What is the timing for starting the merge process? I'm asking >>>>>>>> >> because >>>>>>>> >> >>>>>>>> >> I have (yet another) presentation and this would be a cool update. >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith >>>>>>>> >> >>>>>>>> >> <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > Thanks everyone. >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > Jon - your help will be greatly appreciated. We’ll let you know >>>>>>>> >> > when we’ve got the cycles to invest in performance work >>>>>>>> >> > (hopefully fairly soon). I expect the first step will be >>>>>>>> >> > improving visibility so we can better understand what the system >>>>>>>> >> > is doing (particularly the caching layers), but we can dig in >>>>>>>> >> > together when ready. >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com >>>>>>>> >> > <mailto:j...@rustyrazorblade.com>> wrote: >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > Very exciting! >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > I have a client that's very interested in Accord, so I should >>>>>>>> >> > have budget to dig into it, especially on the performance side of >>>>>>>> >> > things. >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > Jon >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov >>>>>>>> >> > <netud...@gmail.com <mailto:netud...@gmail.com>> wrote: >>>>>>>> >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >> Thank you to all Accord and TCM contributors, it is really >>>>>>>> >> >> exciting to see a development of such huge and wonderful >>>>>>>> >> >> features moving forward and opening the door to the new >>>>>>>> >> >> Cassandra epoch! >>>>>>>> >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston >>>>>>>> >> >> <bl...@ultrablake.com <mailto:bl...@ultrablake.com>> wrote: >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> Thanks Benedict! >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> I’m really excited to see accord reach this milestone, even >>>>>>>> >> >>> with these caveats. You seem to have left yourself off the list >>>>>>>> >> >>> of contributors though, even though you’ve been a central >>>>>>>> >> >>> figure in its development :) So thanks to all accord & tcm >>>>>>>> >> >>> contributors, including Benedict, for making this possible! >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote: >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> Hi everyone, >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> It’s been exactly 3.5 years since the first commit to >>>>>>>> >> >>> cassandra-accord. Yes, really, it’s been that long. >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> We will be starting to validate the feature against real >>>>>>>> >> >>> workloads in the near future, so we can’t sensibly push off >>>>>>>> >> >>> merging much longer. The following is a brief run-down of the >>>>>>>> >> >>> state of play. There are no known bugs, but there remain a >>>>>>>> >> >>> number of caveats we will be incrementally addressing in the >>>>>>>> >> >>> run-up to a full release: >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> [1] Accord is likely to be SLOW until further optimisations are >>>>>>>> >> >>> implemented >>>>>>>> >> >>>>>>>> >> >>> [2] Schema changes have a number of hard edges >>>>>>>> >> >>>>>>>> >> >>> [3] Validation is ongoing, so there are likely still a number >>>>>>>> >> >>> of bugs to shake out >>>>>>>> >> >>>>>>>> >> >>> [4] Many operator visibility/tooling/documentation improvements >>>>>>>> >> >>> are pending >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> To expand a little: >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> [1] As of the last experiment we conducted, accord’s throughput >>>>>>>> >> >>> was poor - also leading to higher LAN latencies. We have done >>>>>>>> >> >>> no WAN experiments to date, but the protocol guarantees should >>>>>>>> >> >>> already achieve better round-trip performance, in particular >>>>>>>> >> >>> under contention. Improving throughput will be the main focus >>>>>>>> >> >>> of attention once we are satisfied the protocol is otherwise >>>>>>>> >> >>> stable, but our focus remains validation for the moment. >>>>>>>> >> >>>>>>>> >> >>> [2] Schema changes have not yet been well integrated with TCM. >>>>>>>> >> >>> Dropping a table for instance will currently cause problems if >>>>>>>> >> >>> nodes are offline. >>>>>>>> >> >>>>>>>> >> >>> [3] We have a range of validations we are already performing >>>>>>>> >> >>> against cassandra-accord directly, and against its integration >>>>>>>> >> >>> with Cassandra in cep-15-accord. We have run hundreds of >>>>>>>> >> >>> billions of simulated transactions, and are still discovering >>>>>>>> >> >>> some minor fault every few billion simulated transactions or >>>>>>>> >> >>> so. There remains a lot more simulated validation to explore, >>>>>>>> >> >>> as well as with real clusters serving real workloads. >>>>>>>> >> >>>>>>>> >> >>> [4] There are already a range of virtual tables for exploring >>>>>>>> >> >>> internal state in Accord, and reasonably good metric support. >>>>>>>> >> >>> However, tracing is not yet supported, and our metric and >>>>>>>> >> >>> virtual table integrations need some further development. >>>>>>>> >> >>>>>>>> >> >>> [5] There are also other edge cases to address such as ensuring >>>>>>>> >> >>> we do not reuse HLCs after restart, supporting >>>>>>>> >> >>> ByteOrderPartitioner, and live migration from/to Paxos is >>>>>>>> >> >>> undergoing fine-tuning and validation; probably there are some >>>>>>>> >> >>> other things I am forgetting. >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> Altogether the feature is fairly mature, despite these caveats. >>>>>>>> >> >>> This is the fruit of the labour of a long list of contributors, >>>>>>>> >> >>> including Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake >>>>>>>> >> >>> Eggleston, Caleb Rackliffe and David Capwell, and represents a >>>>>>>> >> >>> huge undertaking. It also wouldn’t have been possible without >>>>>>>> >> >>> the work of Alex Petrov, Marcus Eriksson and Sam Tunnicliffe on >>>>>>>> >> >>> delivering transactional cluster metadata. I hope you will join >>>>>>>> >> >>> me in thanking them all for their contributions. >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> Alex has also kindly produced some initial overview >>>>>>>> >> >>> documentation for developers, that can be found here: >>>>>>>> >> >>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. >>>>>>>> >> >>> This will be expanded as time permits. >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> Does anyone have any questions or concerns? >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >>> >>>>>>>> >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >> -- >>>>>>>> >> >>>>>>>> >> >> Dmitry Konstantinov >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> > >>>>>>>> >> >>>>>>>> >> >>>>>>>> >>>>>>> >>>>> >