Congrats all! My previous reservations (that have been addressed) aside,
this is an amazing milestone. Awesome, awesome work!

Jordan

On Thu, Apr 17, 2025 at 15:07 David Capwell <dcapw...@apple.com> wrote:

> I have merged cep-15-accord into trunk.  If you experience any issues
> please reach out to me
>
>
> On Apr 17, 2025, at 12:55 AM, Benedict Elliott Smith <bened...@apache.org>
> wrote:
>
> Final update: David has completed a second rebase after we reached parity
> with trunk on our CI, and has confirmed tests remain stable. So I expect
> CEP-15 to merge to trunk sometime today.
>
> No doubt there will be some unexpected disruption to others after a patch
> like this lands. Reach out via slack if you have any trouble.
>
> On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org>
> wrote:
>
> Hi everyone,
>
> To update you: the last patches we considered blockers have landed in the
> cep-15-accord branch. Caleb has now started rebasing the branch onto trunk.
> I expect there will be a few failing tests still to resolve at that point,
> but once they have been squashed we will proceed with the merge.
>
> There remains more work to do before release, and I will publish a
> detailed roadmap to Jira when I’m back in a couple of weeks.
>
>
> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com> wrote:
>
> It sounds like we are all pretty interested in seeing this feature land
> and the branch maintenance is causing overhead that could be spent on
> finalisation. +1 on merging, particularly given the feature flag work.
>
> Once more unto the breach 💪
>
> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org> wrote:
>
>> There are essentially three possible timelines to choose from here:
>>
>> 1) We agree in the next few days to merge to trunk. We will then
>> prioritise rebasing onto trunk and resolving any pre-merge items starting
>> next week.
>> 2) There’s some more debate and agreement to merge to trunk in a week or
>> two. In the meantime we will shift to internal-first development but we’ll
>> likely prioritise the above work as soon as we can, which may be in a few
>> weeks, so we can shift to trunk first development.
>> 3) We don’t agree to merge accord anytime soon, so we shift to
>> internal-first development for the time being. I’m not sure when we will
>> prioritise any of the above.
>>
>> Our resources are finite and we’ve exhausted them (literally), so it’s
>> pretty much pick one of the above. I don’t really mind which you pick, but
>> I won’t personally be prioritising merge after this third attempt.
>>
>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com> wrote:
>>
>> 
>>
>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like
>> it's several hundred commits behind trunk.  Since you'll need to rebase
>> again before merge *anyways*, would it make sense to do it once more, and I
>> can publish easy-cass-lab with the latest branch?  If folks have concerns,
>> it's easy to fire up a cluster (I do it constantly) and try it out.
>>
>> I think if we were to do this, out of consideration we should time box
>> the amount of time for an evaluation and unless someone raises an
>> objection, consider lazy consensus achieved.
>>
>> Jon
>>
>>
>>
>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Because we want to validate against the latest code in trunk, else we
>>> are validating stale behaviours. The cost of rebasing is high, so we do not
>>> do it frequently. That means we will likely stop developing OSS-first, as
>>> the focus will have to move to our internal branch that satisfies these
>>> criteria.
>>>
>>> Exactly what this might be for upstreaming I cannot say. Personally, I
>>> aim to work exclusively on the branch we are stabilising. If that is not
>>> trunk, the latency for my contributions being made public might be high, as
>>> I have a huge imbalance of over-investment to recoup, and anything
>>> unnecessary will be deferred.
>>>
>>> Since the feature is disabled, and the code is almost entirely isolated,
>>> I cannot imagine the cost to the community to removing this work would be
>>> very high. But, I do not intend to argue Accord’s case here. I will let you
>>> all decide.
>>>
>>> Please decide soon though, as it shapes our work planning. The positive
>>> reception so far had lead me to consider prioritising a move to trunk-first
>>> development within the next week or two, and the associated work that
>>> entails. However, if that was optimistic we will have to shift our plans.
>>>
>>>
>>>
>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org> wrote:
>>>
>>> The work and effort in accord has been amazing. And I’m sure it sets a
>>> new standard for code quality and correctness testing which I’m also
>>> entirely behind. I also trust the folks working on it want to take it to
>>> the a fully production ready solution. But I’m worried about circumstances
>>> out of our control leaving us with a very complex feature that isn’t
>>> complete.
>>>
>>> I do have some questions. Could folks help me better understand why
>>> testing real workloads necessitates a merge (my understanding from the
>>> original reason is this is the impetus for why we would merge now)? Also I
>>> think the performance and scheme change caveats are rather large ones. One
>>> of accords promise was better performance and I think making schema changes
>>> with nodes down not being supported is a big gap. Could we have some
>>> criteria like “supports all the operations PaxosV2 supports” or “performs
>>> as well or better than PaxosV2 on [workload(s)]”?
>>>
>>> I understand waiting asks a lot of the authors in terms of baring the
>>> burden of a more complex merge. But I think we also need to consider what
>>> merging is asking the community to bear if the worst happens and we are
>>> unable to take the feature from its current state to something that can be
>>> widely used in production.
>>>
>>>
>>> Jordan
>>>
>>>
>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com>
>>> wrote:
>>>
>>>> +1 to merging it
>>>>
>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
>>>>
>>>> You have my +1
>>>>
>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org> wrote:
>>>> >
>>>> > Correct, these caveats should only apply to tables that have opted-in
>>>> to accord.
>>>> >
>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org> wrote:
>>>> >
>>>> > 
>>>> > So great to see all this hard work about to pay off!
>>>> >
>>>> > On the questions/concerns front, the only concern I would have
>>>> towards merging this to trunk is if any of the caveats apply when someone
>>>> is not using Accord.  Assuming they only apply when the feature flag is
>>>> enabled, I see no reason not to get this merged into trunk once everyone
>>>> involved is happy with the state of it.
>>>> >
>>>> > -Jeremiah
>>>> >
>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith <
>>>> bened...@apache.org> wrote:
>>>> >>
>>>> >> That depends on all of you lovely people :D
>>>> >>
>>>> >> I think we should have finished merging everything we want before QA
>>>> by ~Monday; certainly not much later.
>>>> >>
>>>> >> I think we have some upgrade and python dtest failures to address as
>>>> well.
>>>> >>
>>>> >> So it could be pretty soon if the community is supportive.
>>>> >>
>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>> >>
>>>> >>
>>>> >> What is the timing for starting the merge process? I'm asking because
>>>> >>
>>>> >> I have (yet another) presentation and this would be a cool update.
>>>> >>
>>>> >>
>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>>>> >>
>>>> >> <bened...@apache.org> wrote:
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > Thanks everyone.
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > Jon - your help will be greatly appreciated. We’ll let you know
>>>> when we’ve got the cycles to invest in performance work (hopefully fairly
>>>> soon). I expect the first step will be improving visibility so we can
>>>> better understand what the system is doing (particularly the caching
>>>> layers), but we can dig in together when ready.
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com>
>>>> wrote:
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > Very exciting!
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > I have a client that's very interested in Accord, so I should have
>>>> budget to dig into it, especially on the performance side of things.
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > Jon
>>>> >>
>>>> >> >
>>>> >>
>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov <
>>>> netud...@gmail.com> wrote:
>>>> >>
>>>> >> >>
>>>> >>
>>>> >> >> Thank you to all Accord and TCM contributors, it is really
>>>> exciting to see a development of such huge and wonderful features moving
>>>> forward and opening the door to the new Cassandra epoch!
>>>> >>
>>>> >> >>
>>>> >>
>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston <
>>>> bl...@ultrablake.com> wrote:
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> Thanks Benedict!
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> I’m really excited to see accord reach this milestone, even with
>>>> these caveats. You seem to have left yourself off the list of contributors
>>>> though, even though you’ve been a central figure in its development :) So
>>>> thanks to all accord & tcm contributors, including Benedict, for making
>>>> this possible!
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> Hi everyone,
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> It’s been exactly 3.5 years since the first commit to
>>>> cassandra-accord. Yes, really, it’s been that long.
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> We will be starting to validate the feature against real
>>>> workloads in the near future, so we can’t sensibly push off merging much
>>>> longer. The following is a brief run-down of the state of play. There are
>>>> no known bugs, but there remain a number of caveats we will be
>>>> incrementally addressing in the run-up to a full release:
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> [1] Accord is likely to be SLOW until further optimisations are
>>>> implemented
>>>> >>
>>>> >> >>> [2] Schema changes have a number of hard edges
>>>> >>
>>>> >> >>> [3] Validation is ongoing, so there are likely still a number of
>>>> bugs to shake out
>>>> >>
>>>> >> >>> [4] Many operator visibility/tooling/documentation improvements
>>>> are pending
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> To expand a little:
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> [1] As of the last experiment we conducted, accord’s throughput
>>>> was poor - also leading to higher LAN latencies. We have done no WAN
>>>> experiments to date, but the protocol guarantees should already achieve
>>>> better round-trip performance, in particular under contention. Improving
>>>> throughput will be the main focus of attention once we are satisfied the
>>>> protocol is otherwise stable, but our focus remains validation for the
>>>> moment.
>>>> >>
>>>> >> >>> [2] Schema changes have not yet been well integrated with TCM.
>>>> Dropping a table for instance will currently cause problems if nodes are
>>>> offline.
>>>> >>
>>>> >> >>> [3] We have a range of validations we are already performing
>>>> against cassandra-accord directly, and against its integration with
>>>> Cassandra in cep-15-accord. We have run hundreds of billions of simulated
>>>> transactions, and are still discovering some minor fault every few billion
>>>> simulated transactions or so. There remains a lot more simulated validation
>>>> to explore, as well as with real clusters serving real workloads.
>>>> >>
>>>> >> >>> [4] There are already a range of virtual tables for exploring
>>>> internal state in Accord, and reasonably good metric support. However,
>>>> tracing is not yet supported, and our metric and virtual table integrations
>>>> need some further development.
>>>> >>
>>>> >> >>> [5] There are also other edge cases to address such as ensuring
>>>> we do not reuse HLCs after restart, supporting ByteOrderPartitioner, and
>>>> live migration from/to Paxos is undergoing fine-tuning and validation;
>>>> probably there are some other things I am forgetting.
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> Altogether the feature is fairly mature, despite these caveats.
>>>> This is the fruit of the labour of a long list of contributors, including
>>>> Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb
>>>> Rackliffe and David Capwell, and represents a huge undertaking. It also
>>>> wouldn’t have been possible without the work of Alex Petrov, Marcus
>>>> Eriksson and Sam Tunnicliffe on delivering transactional cluster metadata.
>>>> I hope you will join me in thanking them all for their contributions.
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> Alex has also kindly produced some initial overview
>>>> documentation for developers, that can be found here:
>>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
>>>> This will be expanded as time permits.
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>> Does anyone have any questions or concerns?
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>>
>>>> >>
>>>> >> >>
>>>> >>
>>>> >> >>
>>>> >>
>>>> >> >> --
>>>> >>
>>>> >> >> Dmitry Konstantinov
>>>> >>
>>>> >> >
>>>> >>
>>>> >> >
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>
>
>
>

Reply via email to