Congratulations on this significant milestone and all of the years of effort to get to this point.
> On Apr 18, 2025, at 9:11 AM, Paulo Motta <pa...@apache.org> wrote: > > Awesome milestone, congrats and thanks to all involved! 👏👏👏 > > On Fri, 18 Apr 2025 at 05:19 Dmitry Konstantinov <netud...@gmail.com > <mailto:netud...@gmail.com>> wrote: >> Hooray! Huge thanks to all! Now, I have no more excuses — it's time to try >> it :-D >> >> On Thu, 17 Apr 2025 at 23:42, Jordan West <jorda...@gmail.com >> <mailto:jorda...@gmail.com>> wrote: >>> Congrats all! My previous reservations (that have been addressed) aside, >>> this is an amazing milestone. Awesome, awesome work! >>> >>> Jordan >>> >>> On Thu, Apr 17, 2025 at 15:07 David Capwell <dcapw...@apple.com >>> <mailto:dcapw...@apple.com>> wrote: >>>> I have merged cep-15-accord into trunk. If you experience any issues >>>> please reach out to me >>>> >>>> >>>>> On Apr 17, 2025, at 12:55 AM, Benedict Elliott Smith <bened...@apache.org >>>>> <mailto:bened...@apache.org>> wrote: >>>>> >>>>> Final update: David has completed a second rebase after we reached parity >>>>> with trunk on our CI, and has confirmed tests remain stable. So I expect >>>>> CEP-15 to merge to trunk sometime today. >>>>> >>>>> No doubt there will be some unexpected disruption to others after a patch >>>>> like this lands. Reach out via slack if you have any trouble. >>>>> >>>>>> On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org >>>>>> <mailto:bened...@apache.org>> wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> To update you: the last patches we considered blockers have landed in >>>>>> the cep-15-accord branch. Caleb has now started rebasing the branch onto >>>>>> trunk. I expect there will be a few failing tests still to resolve at >>>>>> that point, but once they have been squashed we will proceed with the >>>>>> merge. >>>>>> >>>>>> There remains more work to do before release, and I will publish a >>>>>> detailed roadmap to Jira when I’m back in a couple of weeks. >>>>>> >>>>>> >>>>>>> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com >>>>>>> <mailto:zznat...@gmail.com>> wrote: >>>>>>> >>>>>>> It sounds like we are all pretty interested in seeing this feature land >>>>>>> and the branch maintenance is causing overhead that could be spent on >>>>>>> finalisation. +1 on merging, particularly given the feature flag work. >>>>>>> >>>>>>> Once more unto the breach 💪 >>>>>>> >>>>>>> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org >>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>>> There are essentially three possible timelines to choose from here: >>>>>>>> >>>>>>>> 1) We agree in the next few days to merge to trunk. We will then >>>>>>>> prioritise rebasing onto trunk and resolving any pre-merge items >>>>>>>> starting next week. >>>>>>>> 2) There’s some more debate and agreement to merge to trunk in a week >>>>>>>> or two. In the meantime we will shift to internal-first development >>>>>>>> but we’ll likely prioritise the above work as soon as we can, which >>>>>>>> may be in a few weeks, so we can shift to trunk first development. >>>>>>>> 3) We don’t agree to merge accord anytime soon, so we shift to >>>>>>>> internal-first development for the time being. I’m not sure when we >>>>>>>> will prioritise any of the above. >>>>>>>> >>>>>>>> Our resources are finite and we’ve exhausted them (literally), so it’s >>>>>>>> pretty much pick one of the above. I don’t really mind which you pick, >>>>>>>> but I won’t personally be prioritising merge after this third attempt. >>>>>>>> >>>>>>>>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com >>>>>>>>> <mailto:j...@rustyrazorblade.com>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks >>>>>>>>> like it's several hundred commits behind trunk. Since you'll need to >>>>>>>>> rebase again before merge *anyways*, would it make sense to do it >>>>>>>>> once more, and I can publish easy-cass-lab with the latest branch? >>>>>>>>> If folks have concerns, it's easy to fire up a cluster (I do it >>>>>>>>> constantly) and try it out. >>>>>>>>> >>>>>>>>> I think if we were to do this, out of consideration we should time >>>>>>>>> box the amount of time for an evaluation and unless someone raises an >>>>>>>>> objection, consider lazy consensus achieved. >>>>>>>>> >>>>>>>>> Jon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith >>>>>>>>> <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>>>>>>> Because we want to validate against the latest code in trunk, else >>>>>>>>>> we are validating stale behaviours. The cost of rebasing is high, so >>>>>>>>>> we do not do it frequently. That means we will likely stop >>>>>>>>>> developing OSS-first, as the focus will have to move to our internal >>>>>>>>>> branch that satisfies these criteria. >>>>>>>>>> >>>>>>>>>> Exactly what this might be for upstreaming I cannot say. Personally, >>>>>>>>>> I aim to work exclusively on the branch we are stabilising. If that >>>>>>>>>> is not trunk, the latency for my contributions being made public >>>>>>>>>> might be high, as I have a huge imbalance of over-investment to >>>>>>>>>> recoup, and anything unnecessary will be deferred. >>>>>>>>>> >>>>>>>>>> Since the feature is disabled, and the code is almost entirely >>>>>>>>>> isolated, I cannot imagine the cost to the community to removing >>>>>>>>>> this work would be very high. But, I do not intend to argue Accord’s >>>>>>>>>> case here. I will let you all decide. >>>>>>>>>> >>>>>>>>>> Please decide soon though, as it shapes our work planning. The >>>>>>>>>> positive reception so far had lead me to consider prioritising a >>>>>>>>>> move to trunk-first development within the next week or two, and the >>>>>>>>>> associated work that entails. However, if that was optimistic we >>>>>>>>>> will have to shift our plans. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org >>>>>>>>>>> <mailto:jw...@apache.org>> wrote: >>>>>>>>>>> >>>>>>>>>>> The work and effort in accord has been amazing. And I’m sure it >>>>>>>>>>> sets a new standard for code quality and correctness testing which >>>>>>>>>>> I’m also entirely behind. I also trust the folks working on it want >>>>>>>>>>> to take it to the a fully production ready solution. But I’m >>>>>>>>>>> worried about circumstances out of our control leaving us with a >>>>>>>>>>> very complex feature that isn’t complete. >>>>>>>>>>> >>>>>>>>>>> I do have some questions. Could folks help me better understand why >>>>>>>>>>> testing real workloads necessitates a merge (my understanding from >>>>>>>>>>> the original reason is this is the impetus for why we would merge >>>>>>>>>>> now)? Also I think the performance and scheme change caveats are >>>>>>>>>>> rather large ones. One of accords promise was better performance >>>>>>>>>>> and I think making schema changes with nodes down not being >>>>>>>>>>> supported is a big gap. Could we have some criteria like “supports >>>>>>>>>>> all the operations PaxosV2 supports” or “performs as well or better >>>>>>>>>>> than PaxosV2 on [workload(s)]”? >>>>>>>>>>> >>>>>>>>>>> I understand waiting asks a lot of the authors in terms of baring >>>>>>>>>>> the burden of a more complex merge. But I think we also need to >>>>>>>>>>> consider what merging is asking the community to bear if the worst >>>>>>>>>>> happens and we are unable to take the feature from its current >>>>>>>>>>> state to something that can be widely used in production. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Jordan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com >>>>>>>>>>> <mailto:bl...@ultrablake.com>> wrote: >>>>>>>>>>>> +1 to merging it >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote: >>>>>>>>>>>>> You have my +1 >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org >>>>>>>>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Correct, these caveats should only apply to tables that have >>>>>>>>>>>>> > opted-in to accord. >>>>>>>>>>>>> > >>>>>>>>>>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org >>>>>>>>>>>>> > <mailto:jerem...@apache.org>> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > So great to see all this hard work about to pay off! >>>>>>>>>>>>> > >>>>>>>>>>>>> > On the questions/concerns front, the only concern I would have >>>>>>>>>>>>> > towards merging this to trunk is if any of the caveats apply >>>>>>>>>>>>> > when someone is not using Accord. Assuming they only apply >>>>>>>>>>>>> > when the feature flag is enabled, I see no reason not to get >>>>>>>>>>>>> > this merged into trunk once everyone involved is happy with the >>>>>>>>>>>>> > state of it. >>>>>>>>>>>>> > >>>>>>>>>>>>> > -Jeremiah >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith >>>>>>>>>>>>> > <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> That depends on all of you lovely people :D >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> I think we should have finished merging everything we want >>>>>>>>>>>>> >> before QA by ~Monday; certainly not much later. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> I think we have some upgrade and python dtest failures to >>>>>>>>>>>>> >> address as well. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> So it could be pretty soon if the community is supportive. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com >>>>>>>>>>>>> >> <mailto:pmcfa...@gmail.com>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> What is the timing for starting the merge process? I'm asking >>>>>>>>>>>>> >> because >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> I have (yet another) presentation and this would be a cool >>>>>>>>>>>>> >> update. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> <bened...@apache.org <mailto:bened...@apache.org>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > Thanks everyone. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > Jon - your help will be greatly appreciated. We’ll let you >>>>>>>>>>>>> >> > know when we’ve got the cycles to invest in performance work >>>>>>>>>>>>> >> > (hopefully fairly soon). I expect the first step will be >>>>>>>>>>>>> >> > improving visibility so we can better understand what the >>>>>>>>>>>>> >> > system is doing (particularly the caching layers), but we >>>>>>>>>>>>> >> > can dig in together when ready. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad >>>>>>>>>>>>> >> > <j...@rustyrazorblade.com <mailto:j...@rustyrazorblade.com>> >>>>>>>>>>>>> >> > wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > Very exciting! >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > I have a client that's very interested in Accord, so I >>>>>>>>>>>>> >> > should have budget to dig into it, especially on the >>>>>>>>>>>>> >> > performance side of things. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > Jon >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov >>>>>>>>>>>>> >> > <netud...@gmail.com <mailto:netud...@gmail.com>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> Thank you to all Accord and TCM contributors, it is really >>>>>>>>>>>>> >> >> exciting to see a development of such huge and wonderful >>>>>>>>>>>>> >> >> features moving forward and opening the door to the new >>>>>>>>>>>>> >> >> Cassandra epoch! >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston >>>>>>>>>>>>> >> >> <bl...@ultrablake.com <mailto:bl...@ultrablake.com>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> Thanks Benedict! >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> I’m really excited to see accord reach this milestone, >>>>>>>>>>>>> >> >>> even with these caveats. You seem to have left yourself >>>>>>>>>>>>> >> >>> off the list of contributors though, even though you’ve >>>>>>>>>>>>> >> >>> been a central figure in its development :) So thanks to >>>>>>>>>>>>> >> >>> all accord & tcm contributors, including Benedict, for >>>>>>>>>>>>> >> >>> making this possible! >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith >>>>>>>>>>>>> >> >>> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> Hi everyone, >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> It’s been exactly 3.5 years since the first commit to >>>>>>>>>>>>> >> >>> cassandra-accord. Yes, really, it’s been that long. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> We will be starting to validate the feature against real >>>>>>>>>>>>> >> >>> workloads in the near future, so we can’t sensibly push >>>>>>>>>>>>> >> >>> off merging much longer. The following is a brief run-down >>>>>>>>>>>>> >> >>> of the state of play. There are no known bugs, but there >>>>>>>>>>>>> >> >>> remain a number of caveats we will be incrementally >>>>>>>>>>>>> >> >>> addressing in the run-up to a full release: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [1] Accord is likely to be SLOW until further >>>>>>>>>>>>> >> >>> optimisations are implemented >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [2] Schema changes have a number of hard edges >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [3] Validation is ongoing, so there are likely still a >>>>>>>>>>>>> >> >>> number of bugs to shake out >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [4] Many operator visibility/tooling/documentation >>>>>>>>>>>>> >> >>> improvements are pending >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> To expand a little: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [1] As of the last experiment we conducted, accord’s >>>>>>>>>>>>> >> >>> throughput was poor - also leading to higher LAN >>>>>>>>>>>>> >> >>> latencies. We have done no WAN experiments to date, but >>>>>>>>>>>>> >> >>> the protocol guarantees should already achieve better >>>>>>>>>>>>> >> >>> round-trip performance, in particular under contention. >>>>>>>>>>>>> >> >>> Improving throughput will be the main focus of attention >>>>>>>>>>>>> >> >>> once we are satisfied the protocol is otherwise stable, >>>>>>>>>>>>> >> >>> but our focus remains validation for the moment. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [2] Schema changes have not yet been well integrated with >>>>>>>>>>>>> >> >>> TCM. Dropping a table for instance will currently cause >>>>>>>>>>>>> >> >>> problems if nodes are offline. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [3] We have a range of validations we are already >>>>>>>>>>>>> >> >>> performing against cassandra-accord directly, and against >>>>>>>>>>>>> >> >>> its integration with Cassandra in cep-15-accord. We have >>>>>>>>>>>>> >> >>> run hundreds of billions of simulated transactions, and >>>>>>>>>>>>> >> >>> are still discovering some minor fault every few billion >>>>>>>>>>>>> >> >>> simulated transactions or so. There remains a lot more >>>>>>>>>>>>> >> >>> simulated validation to explore, as well as with real >>>>>>>>>>>>> >> >>> clusters serving real workloads. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [4] There are already a range of virtual tables for >>>>>>>>>>>>> >> >>> exploring internal state in Accord, and reasonably good >>>>>>>>>>>>> >> >>> metric support. However, tracing is not yet supported, and >>>>>>>>>>>>> >> >>> our metric and virtual table integrations need some >>>>>>>>>>>>> >> >>> further development. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> [5] There are also other edge cases to address such as >>>>>>>>>>>>> >> >>> ensuring we do not reuse HLCs after restart, supporting >>>>>>>>>>>>> >> >>> ByteOrderPartitioner, and live migration from/to Paxos is >>>>>>>>>>>>> >> >>> undergoing fine-tuning and validation; probably there are >>>>>>>>>>>>> >> >>> some other things I am forgetting. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> Altogether the feature is fairly mature, despite these >>>>>>>>>>>>> >> >>> caveats. This is the fruit of the labour of a long list of >>>>>>>>>>>>> >> >>> contributors, including Aleksey Yeschenko, Alex Petrov, >>>>>>>>>>>>> >> >>> Ariel Weisberg, Blake Eggleston, Caleb Rackliffe and David >>>>>>>>>>>>> >> >>> Capwell, and represents a huge undertaking. It also >>>>>>>>>>>>> >> >>> wouldn’t have been possible without the work of Alex >>>>>>>>>>>>> >> >>> Petrov, Marcus Eriksson and Sam Tunnicliffe on delivering >>>>>>>>>>>>> >> >>> transactional cluster metadata. I hope you will join me in >>>>>>>>>>>>> >> >>> thanking them all for their contributions. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> Alex has also kindly produced some initial overview >>>>>>>>>>>>> >> >>> documentation for developers, that can be found here: >>>>>>>>>>>>> >> >>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. >>>>>>>>>>>>> >> >>> This will be expanded as time permits. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> Does anyone have any questions or concerns? >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> -- >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >> Dmitry Konstantinov >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>>> >> >> >> >> -- >> Dmitry Konstantinov