Awesome milestone, congrats and thanks to all involved! 👏👏👏 On Fri, 18 Apr 2025 at 05:19 Dmitry Konstantinov <netud...@gmail.com> wrote:
> Hooray! Huge thanks to all! Now, I have no more excuses — it's time to try > it :-D > > On Thu, 17 Apr 2025 at 23:42, Jordan West <jorda...@gmail.com> wrote: > >> Congrats all! My previous reservations (that have been addressed) aside, >> this is an amazing milestone. Awesome, awesome work! >> >> Jordan >> >> On Thu, Apr 17, 2025 at 15:07 David Capwell <dcapw...@apple.com> wrote: >> >>> I have merged cep-15-accord into trunk. If you experience any issues >>> please reach out to me >>> >>> >>> On Apr 17, 2025, at 12:55 AM, Benedict Elliott Smith < >>> bened...@apache.org> wrote: >>> >>> Final update: David has completed a second rebase after we reached >>> parity with trunk on our CI, and has confirmed tests remain stable. So I >>> expect CEP-15 to merge to trunk sometime today. >>> >>> No doubt there will be some unexpected disruption to others after a >>> patch like this lands. Reach out via slack if you have any trouble. >>> >>> On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org> >>> wrote: >>> >>> Hi everyone, >>> >>> To update you: the last patches we considered blockers have landed in >>> the cep-15-accord branch. Caleb has now started rebasing the branch onto >>> trunk. I expect there will be a few failing tests still to resolve at that >>> point, but once they have been squashed we will proceed with the merge. >>> >>> There remains more work to do before release, and I will publish a >>> detailed roadmap to Jira when I’m back in a couple of weeks. >>> >>> >>> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com> wrote: >>> >>> It sounds like we are all pretty interested in seeing this feature land >>> and the branch maintenance is causing overhead that could be spent on >>> finalisation. +1 on merging, particularly given the feature flag work. >>> >>> Once more unto the breach 💪 >>> >>> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org> wrote: >>> >>>> There are essentially three possible timelines to choose from here: >>>> >>>> 1) We agree in the next few days to merge to trunk. We will then >>>> prioritise rebasing onto trunk and resolving any pre-merge items starting >>>> next week. >>>> 2) There’s some more debate and agreement to merge to trunk in a week >>>> or two. In the meantime we will shift to internal-first development but >>>> we’ll likely prioritise the above work as soon as we can, which may be in a >>>> few weeks, so we can shift to trunk first development. >>>> 3) We don’t agree to merge accord anytime soon, so we shift to >>>> internal-first development for the time being. I’m not sure when we will >>>> prioritise any of the above. >>>> >>>> Our resources are finite and we’ve exhausted them (literally), so it’s >>>> pretty much pick one of the above. I don’t really mind which you pick, but >>>> I won’t personally be prioritising merge after this third attempt. >>>> >>>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com> wrote: >>>> >>>>  >>>> >>>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks >>>> like it's several hundred commits behind trunk. Since you'll need to >>>> rebase again before merge *anyways*, would it make sense to do it once >>>> more, and I can publish easy-cass-lab with the latest branch? If folks >>>> have concerns, it's easy to fire up a cluster (I do it constantly) and try >>>> it out. >>>> >>>> I think if we were to do this, out of consideration we should time box >>>> the amount of time for an evaluation and unless someone raises an >>>> objection, consider lazy consensus achieved. >>>> >>>> Jon >>>> >>>> >>>> >>>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith < >>>> bened...@apache.org> wrote: >>>> >>>>> Because we want to validate against the latest code in trunk, else we >>>>> are validating stale behaviours. The cost of rebasing is high, so we do >>>>> not >>>>> do it frequently. That means we will likely stop developing OSS-first, as >>>>> the focus will have to move to our internal branch that satisfies these >>>>> criteria. >>>>> >>>>> Exactly what this might be for upstreaming I cannot say. Personally, I >>>>> aim to work exclusively on the branch we are stabilising. If that is not >>>>> trunk, the latency for my contributions being made public might be high, >>>>> as >>>>> I have a huge imbalance of over-investment to recoup, and anything >>>>> unnecessary will be deferred. >>>>> >>>>> Since the feature is disabled, and the code is almost entirely >>>>> isolated, I cannot imagine the cost to the community to removing this work >>>>> would be very high. But, I do not intend to argue Accord’s case here. I >>>>> will let you all decide. >>>>> >>>>> Please decide soon though, as it shapes our work planning. The >>>>> positive reception so far had lead me to consider prioritising a move to >>>>> trunk-first development within the next week or two, and the associated >>>>> work that entails. However, if that was optimistic we will have to shift >>>>> our plans. >>>>> >>>>> >>>>> >>>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org> wrote: >>>>> >>>>> The work and effort in accord has been amazing. And I’m sure it sets a >>>>> new standard for code quality and correctness testing which I’m also >>>>> entirely behind. I also trust the folks working on it want to take it to >>>>> the a fully production ready solution. But I’m worried about circumstances >>>>> out of our control leaving us with a very complex feature that isn’t >>>>> complete. >>>>> >>>>> I do have some questions. Could folks help me better understand why >>>>> testing real workloads necessitates a merge (my understanding from the >>>>> original reason is this is the impetus for why we would merge now)? Also I >>>>> think the performance and scheme change caveats are rather large ones. One >>>>> of accords promise was better performance and I think making schema >>>>> changes >>>>> with nodes down not being supported is a big gap. Could we have some >>>>> criteria like “supports all the operations PaxosV2 supports” or “performs >>>>> as well or better than PaxosV2 on [workload(s)]”? >>>>> >>>>> I understand waiting asks a lot of the authors in terms of baring the >>>>> burden of a more complex merge. But I think we also need to consider what >>>>> merging is asking the community to bear if the worst happens and we are >>>>> unable to take the feature from its current state to something that can be >>>>> widely used in production. >>>>> >>>>> >>>>> Jordan >>>>> >>>>> >>>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com> >>>>> wrote: >>>>> >>>>>> +1 to merging it >>>>>> >>>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote: >>>>>> >>>>>> You have my +1 >>>>>> >>>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org> wrote: >>>>>> > >>>>>> > Correct, these caveats should only apply to tables that have >>>>>> opted-in to accord. >>>>>> > >>>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org> >>>>>> wrote: >>>>>> > >>>>>> >  >>>>>> > So great to see all this hard work about to pay off! >>>>>> > >>>>>> > On the questions/concerns front, the only concern I would have >>>>>> towards merging this to trunk is if any of the caveats apply when someone >>>>>> is not using Accord. Assuming they only apply when the feature flag is >>>>>> enabled, I see no reason not to get this merged into trunk once everyone >>>>>> involved is happy with the state of it. >>>>>> > >>>>>> > -Jeremiah >>>>>> > >>>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith < >>>>>> bened...@apache.org> wrote: >>>>>> >> >>>>>> >> That depends on all of you lovely people :D >>>>>> >> >>>>>> >> I think we should have finished merging everything we want before >>>>>> QA by ~Monday; certainly not much later. >>>>>> >> >>>>>> >> I think we have some upgrade and python dtest failures to address >>>>>> as well. >>>>>> >> >>>>>> >> So it could be pretty soon if the community is supportive. >>>>>> >> >>>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com> >>>>>> wrote: >>>>>> >> >>>>>> >> >>>>>> >> What is the timing for starting the merge process? I'm asking >>>>>> because >>>>>> >> >>>>>> >> I have (yet another) presentation and this would be a cool update. >>>>>> >> >>>>>> >> >>>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith >>>>>> >> >>>>>> >> <bened...@apache.org> wrote: >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > Thanks everyone. >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > Jon - your help will be greatly appreciated. We’ll let you know >>>>>> when we’ve got the cycles to invest in performance work (hopefully fairly >>>>>> soon). I expect the first step will be improving visibility so we can >>>>>> better understand what the system is doing (particularly the caching >>>>>> layers), but we can dig in together when ready. >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com> >>>>>> wrote: >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > Very exciting! >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > I have a client that's very interested in Accord, so I should >>>>>> have budget to dig into it, especially on the performance side of things. >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > Jon >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov < >>>>>> netud...@gmail.com> wrote: >>>>>> >> >>>>>> >> >> >>>>>> >> >>>>>> >> >> Thank you to all Accord and TCM contributors, it is really >>>>>> exciting to see a development of such huge and wonderful features moving >>>>>> forward and opening the door to the new Cassandra epoch! >>>>>> >> >>>>>> >> >> >>>>>> >> >>>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston < >>>>>> bl...@ultrablake.com> wrote: >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> Thanks Benedict! >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> I’m really excited to see accord reach this milestone, even >>>>>> with these caveats. You seem to have left yourself off the list of >>>>>> contributors though, even though you’ve been a central figure in its >>>>>> development :) So thanks to all accord & tcm contributors, including >>>>>> Benedict, for making this possible! >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote: >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> Hi everyone, >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> It’s been exactly 3.5 years since the first commit to >>>>>> cassandra-accord. Yes, really, it’s been that long. >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> We will be starting to validate the feature against real >>>>>> workloads in the near future, so we can’t sensibly push off merging much >>>>>> longer. The following is a brief run-down of the state of play. There are >>>>>> no known bugs, but there remain a number of caveats we will be >>>>>> incrementally addressing in the run-up to a full release: >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> [1] Accord is likely to be SLOW until further optimisations >>>>>> are implemented >>>>>> >> >>>>>> >> >>> [2] Schema changes have a number of hard edges >>>>>> >> >>>>>> >> >>> [3] Validation is ongoing, so there are likely still a number >>>>>> of bugs to shake out >>>>>> >> >>>>>> >> >>> [4] Many operator visibility/tooling/documentation >>>>>> improvements are pending >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> To expand a little: >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> [1] As of the last experiment we conducted, accord’s >>>>>> throughput was poor - also leading to higher LAN latencies. We have done >>>>>> no >>>>>> WAN experiments to date, but the protocol guarantees should already >>>>>> achieve >>>>>> better round-trip performance, in particular under contention. Improving >>>>>> throughput will be the main focus of attention once we are satisfied the >>>>>> protocol is otherwise stable, but our focus remains validation for the >>>>>> moment. >>>>>> >> >>>>>> >> >>> [2] Schema changes have not yet been well integrated with TCM. >>>>>> Dropping a table for instance will currently cause problems if nodes are >>>>>> offline. >>>>>> >> >>>>>> >> >>> [3] We have a range of validations we are already performing >>>>>> against cassandra-accord directly, and against its integration with >>>>>> Cassandra in cep-15-accord. We have run hundreds of billions of simulated >>>>>> transactions, and are still discovering some minor fault every few >>>>>> billion >>>>>> simulated transactions or so. There remains a lot more simulated >>>>>> validation >>>>>> to explore, as well as with real clusters serving real workloads. >>>>>> >> >>>>>> >> >>> [4] There are already a range of virtual tables for exploring >>>>>> internal state in Accord, and reasonably good metric support. However, >>>>>> tracing is not yet supported, and our metric and virtual table >>>>>> integrations >>>>>> need some further development. >>>>>> >> >>>>>> >> >>> [5] There are also other edge cases to address such as >>>>>> ensuring we do not reuse HLCs after restart, supporting >>>>>> ByteOrderPartitioner, and live migration from/to Paxos is undergoing >>>>>> fine-tuning and validation; probably there are some other things I am >>>>>> forgetting. >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> Altogether the feature is fairly mature, despite these >>>>>> caveats. This is the fruit of the labour of a long list of contributors, >>>>>> including Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake >>>>>> Eggleston, >>>>>> Caleb Rackliffe and David Capwell, and represents a huge undertaking. It >>>>>> also wouldn’t have been possible without the work of Alex Petrov, Marcus >>>>>> Eriksson and Sam Tunnicliffe on delivering transactional cluster >>>>>> metadata. >>>>>> I hope you will join me in thanking them all for their contributions. >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> Alex has also kindly produced some initial overview >>>>>> documentation for developers, that can be found here: >>>>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. >>>>>> This will be expanded as time permits. >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> Does anyone have any questions or concerns? >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >>> >>>>>> >> >>>>>> >> >> >>>>>> >> >>>>>> >> >> >>>>>> >> >>>>>> >> >> -- >>>>>> >> >>>>>> >> >> Dmitry Konstantinov >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> > >>>>>> >> >>>>>> >> >>>>>> >>>>>> >>>>>> >>>>> >>> >>> >>> > > -- > Dmitry Konstantinov >