Final update: David has completed a second rebase after we reached parity with 
trunk on our CI, and has confirmed tests remain stable. So I expect CEP-15 to 
merge to trunk sometime today.

No doubt there will be some unexpected disruption to others after a patch like 
this lands. Reach out via slack if you have any trouble.

> On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org> wrote:
> 
> Hi everyone,
> 
> To update you: the last patches we considered blockers have landed in the 
> cep-15-accord branch. Caleb has now started rebasing the branch onto trunk. I 
> expect there will be a few failing tests still to resolve at that point, but 
> once they have been squashed we will proceed with the merge.
> 
> There remains more work to do before release, and I will publish a detailed 
> roadmap to Jira when I’m back in a couple of weeks. 
> 
> 
>> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com> wrote:
>> 
>> It sounds like we are all pretty interested in seeing this feature land and 
>> the branch maintenance is causing overhead that could be spent on 
>> finalisation. +1 on merging, particularly given the feature flag work. 
>> 
>> Once more unto the breach 💪
>> 
>> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org 
>> <mailto:bened...@apache.org>> wrote:
>>> There are essentially three possible timelines to choose from here: 
>>> 
>>> 1) We agree in the next few days to merge to trunk. We will then prioritise 
>>> rebasing onto trunk and resolving any pre-merge items starting next week.
>>> 2) There’s some more debate and agreement to merge to trunk in a week or 
>>> two. In the meantime we will shift to internal-first development but we’ll 
>>> likely prioritise the above work as soon as we can, which may be in a few 
>>> weeks, so we can shift to trunk first development.
>>> 3) We don’t agree to merge accord anytime soon, so we shift to 
>>> internal-first development for the time being. I’m not sure when we will 
>>> prioritise any of the above.
>>> 
>>> Our resources are finite and we’ve exhausted them (literally), so it’s 
>>> pretty much pick one of the above. I don’t really mind which you pick, but 
>>> I won’t personally be prioritising merge after this third attempt.
>>> 
>>>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com 
>>>> <mailto:j...@rustyrazorblade.com>> wrote:
>>>> 
>>>> 
>>> 
>>>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like 
>>>> it's several hundred commits behind trunk.  Since you'll need to rebase 
>>>> again before merge *anyways*, would it make sense to do it once more, and 
>>>> I can publish easy-cass-lab with the latest branch?  If folks have 
>>>> concerns, it's easy to fire up a cluster (I do it constantly) and try it 
>>>> out.
>>>> 
>>>> I think if we were to do this, out of consideration we should time box the 
>>>> amount of time for an evaluation and unless someone raises an objection, 
>>>> consider lazy consensus achieved.
>>>> 
>>>> Jon
>>>> 
>>>> 
>>>> 
>>>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith 
>>>> <bened...@apache.org <mailto:bened...@apache.org>> wrote:
>>>>> Because we want to validate against the latest code in trunk, else we are 
>>>>> validating stale behaviours. The cost of rebasing is high, so we do not 
>>>>> do it frequently. That means we will likely stop developing OSS-first, as 
>>>>> the focus will have to move to our internal branch that satisfies these 
>>>>> criteria.
>>>>> 
>>>>> Exactly what this might be for upstreaming I cannot say. Personally, I 
>>>>> aim to work exclusively on the branch we are stabilising. If that is not 
>>>>> trunk, the latency for my contributions being made public might be high, 
>>>>> as I have a huge imbalance of over-investment to recoup, and anything 
>>>>> unnecessary will be deferred.
>>>>> 
>>>>> Since the feature is disabled, and the code is almost entirely isolated, 
>>>>> I cannot imagine the cost to the community to removing this work would be 
>>>>> very high. But, I do not intend to argue Accord’s case here. I will let 
>>>>> you all decide.
>>>>> 
>>>>> Please decide soon though, as it shapes our work planning. The positive 
>>>>> reception so far had lead me to consider prioritising a move to 
>>>>> trunk-first development within the next week or two, and the associated 
>>>>> work that entails. However, if that was optimistic we will have to shift 
>>>>> our plans.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org 
>>>>>> <mailto:jw...@apache.org>> wrote:
>>>>>> 
>>>>>> The work and effort in accord has been amazing. And I’m sure it sets a 
>>>>>> new standard for code quality and correctness testing which I’m also 
>>>>>> entirely behind. I also trust the folks working on it want to take it to 
>>>>>> the a fully production ready solution. But I’m worried about 
>>>>>> circumstances out of our control leaving us with a very complex feature 
>>>>>> that isn’t complete. 
>>>>>> 
>>>>>> I do have some questions. Could folks help me better understand why 
>>>>>> testing real workloads necessitates a merge (my understanding from the 
>>>>>> original reason is this is the impetus for why we would merge now)? Also 
>>>>>> I think the performance and scheme change caveats are rather large ones. 
>>>>>> One of accords promise was better performance and I think making schema 
>>>>>> changes with nodes down not being supported is a big gap. Could we have 
>>>>>> some criteria like “supports all the operations PaxosV2 supports” or 
>>>>>> “performs as well or better than PaxosV2 on [workload(s)]”? 
>>>>>> 
>>>>>> I understand waiting asks a lot of the authors in terms of baring the 
>>>>>> burden of a more complex merge. But I think we also need to consider 
>>>>>> what merging is asking the community to bear if the worst happens and we 
>>>>>> are unable to take the feature from its current state to something that 
>>>>>> can be widely used in production.
>>>>>> 
>>>>>> 
>>>>>> Jordan 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com 
>>>>>> <mailto:bl...@ultrablake.com>> wrote:
>>>>>>> +1 to merging it
>>>>>>> 
>>>>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
>>>>>>>> You have my +1
>>>>>>>> 
>>>>>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org 
>>>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>>> >
>>>>>>>> > Correct, these caveats should only apply to tables that have 
>>>>>>>> > opted-in to accord.
>>>>>>>> >
>>>>>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org 
>>>>>>>> > <mailto:jerem...@apache.org>> wrote:
>>>>>>>> >
>>>>>>>> > 
>>>>>>>> > So great to see all this hard work about to pay off!
>>>>>>>> >
>>>>>>>> > On the questions/concerns front, the only concern I would have 
>>>>>>>> > towards merging this to trunk is if any of the caveats apply when 
>>>>>>>> > someone is not using Accord.  Assuming they only apply when the 
>>>>>>>> > feature flag is enabled, I see no reason not to get this merged into 
>>>>>>>> > trunk once everyone involved is happy with the state of it.
>>>>>>>> >
>>>>>>>> > -Jeremiah
>>>>>>>> >
>>>>>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith 
>>>>>>>> > <bened...@apache.org <mailto:bened...@apache.org>> wrote:
>>>>>>>> >>
>>>>>>>> >> That depends on all of you lovely people :D
>>>>>>>> >>
>>>>>>>> >> I think we should have finished merging everything we want before 
>>>>>>>> >> QA by ~Monday; certainly not much later.
>>>>>>>> >>
>>>>>>>> >> I think we have some upgrade and python dtest failures to address 
>>>>>>>> >> as well.
>>>>>>>> >>
>>>>>>>> >> So it could be pretty soon if the community is supportive.
>>>>>>>> >>
>>>>>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com 
>>>>>>>> >> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> What is the timing for starting the merge process? I'm asking 
>>>>>>>> >> because
>>>>>>>> >>
>>>>>>>> >> I have (yet another) presentation and this would be a cool update.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>>>>>>>> >>
>>>>>>>> >> <bened...@apache.org <mailto:bened...@apache.org>> wrote:
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > Thanks everyone.
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > Jon - your help will be greatly appreciated. We’ll let you know 
>>>>>>>> >> > when we’ve got the cycles to invest in performance work 
>>>>>>>> >> > (hopefully fairly soon). I expect the first step will be 
>>>>>>>> >> > improving visibility so we can better understand what the system 
>>>>>>>> >> > is doing (particularly the caching layers), but we can dig in 
>>>>>>>> >> > together when ready.
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com 
>>>>>>>> >> > <mailto:j...@rustyrazorblade.com>> wrote:
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > Very exciting!
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > I have a client that's very interested in Accord, so I should 
>>>>>>>> >> > have budget to dig into it, especially on the performance side of 
>>>>>>>> >> > things.
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > Jon
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov 
>>>>>>>> >> > <netud...@gmail.com <mailto:netud...@gmail.com>> wrote:
>>>>>>>> >>
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> >> >> Thank you to all Accord and TCM contributors, it is really 
>>>>>>>> >> >> exciting to see a development of such huge and wonderful 
>>>>>>>> >> >> features moving forward and opening the door to the new 
>>>>>>>> >> >> Cassandra epoch!
>>>>>>>> >>
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston 
>>>>>>>> >> >> <bl...@ultrablake.com <mailto:bl...@ultrablake.com>> wrote:
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> Thanks Benedict!
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> I’m really excited to see accord reach this milestone, even 
>>>>>>>> >> >>> with these caveats. You seem to have left yourself off the list 
>>>>>>>> >> >>> of contributors though, even though you’ve been a central 
>>>>>>>> >> >>> figure in its development :) So thanks to all accord & tcm 
>>>>>>>> >> >>> contributors, including Benedict, for making this possible!
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> Hi everyone,
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> It’s been exactly 3.5 years since the first commit to 
>>>>>>>> >> >>> cassandra-accord. Yes, really, it’s been that long.
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> We will be starting to validate the feature against real 
>>>>>>>> >> >>> workloads in the near future, so we can’t sensibly push off 
>>>>>>>> >> >>> merging much longer. The following is a brief run-down of the 
>>>>>>>> >> >>> state of play. There are no known bugs, but there remain a 
>>>>>>>> >> >>> number of caveats we will be incrementally addressing in the 
>>>>>>>> >> >>> run-up to a full release:
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> [1] Accord is likely to be SLOW until further optimisations are 
>>>>>>>> >> >>> implemented
>>>>>>>> >>
>>>>>>>> >> >>> [2] Schema changes have a number of hard edges
>>>>>>>> >>
>>>>>>>> >> >>> [3] Validation is ongoing, so there are likely still a number 
>>>>>>>> >> >>> of bugs to shake out
>>>>>>>> >>
>>>>>>>> >> >>> [4] Many operator visibility/tooling/documentation improvements 
>>>>>>>> >> >>> are pending
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> To expand a little:
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> [1] As of the last experiment we conducted, accord’s throughput 
>>>>>>>> >> >>> was poor - also leading to higher LAN latencies. We have done 
>>>>>>>> >> >>> no WAN experiments to date, but the protocol guarantees should 
>>>>>>>> >> >>> already achieve better round-trip performance, in particular 
>>>>>>>> >> >>> under contention. Improving throughput will be the main focus 
>>>>>>>> >> >>> of attention once we are satisfied the protocol is otherwise 
>>>>>>>> >> >>> stable, but our focus remains validation for the moment.
>>>>>>>> >>
>>>>>>>> >> >>> [2] Schema changes have not yet been well integrated with TCM. 
>>>>>>>> >> >>> Dropping a table for instance will currently cause problems if 
>>>>>>>> >> >>> nodes are offline.
>>>>>>>> >>
>>>>>>>> >> >>> [3] We have a range of validations we are already performing 
>>>>>>>> >> >>> against cassandra-accord directly, and against its integration 
>>>>>>>> >> >>> with Cassandra in cep-15-accord. We have run hundreds of 
>>>>>>>> >> >>> billions of simulated transactions, and are still discovering 
>>>>>>>> >> >>> some minor fault every few billion simulated transactions or 
>>>>>>>> >> >>> so. There remains a lot more simulated validation to explore, 
>>>>>>>> >> >>> as well as with real clusters serving real workloads.
>>>>>>>> >>
>>>>>>>> >> >>> [4] There are already a range of virtual tables for exploring 
>>>>>>>> >> >>> internal state in Accord, and reasonably good metric support. 
>>>>>>>> >> >>> However, tracing is not yet supported, and our metric and 
>>>>>>>> >> >>> virtual table integrations need some further development.
>>>>>>>> >>
>>>>>>>> >> >>> [5] There are also other edge cases to address such as ensuring 
>>>>>>>> >> >>> we do not reuse HLCs after restart, supporting 
>>>>>>>> >> >>> ByteOrderPartitioner, and live migration from/to Paxos is 
>>>>>>>> >> >>> undergoing fine-tuning and validation; probably there are some 
>>>>>>>> >> >>> other things I am forgetting.
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> Altogether the feature is fairly mature, despite these caveats. 
>>>>>>>> >> >>> This is the fruit of the labour of a long list of contributors, 
>>>>>>>> >> >>> including Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake 
>>>>>>>> >> >>> Eggleston, Caleb Rackliffe and David Capwell, and represents a 
>>>>>>>> >> >>> huge undertaking. It also wouldn’t have been possible without 
>>>>>>>> >> >>> the work of Alex Petrov, Marcus Eriksson and Sam Tunnicliffe on 
>>>>>>>> >> >>> delivering transactional cluster metadata. I hope you will join 
>>>>>>>> >> >>> me in thanking them all for their contributions.
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> Alex has also kindly produced some initial overview 
>>>>>>>> >> >>> documentation for developers, that can be found here: 
>>>>>>>> >> >>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
>>>>>>>> >> >>>  This will be expanded as time permits.
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>> Does anyone have any questions or concerns?
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> >> >> --
>>>>>>>> >>
>>>>>>>> >> >> Dmitry Konstantinov
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> 
>>>>>>> 
>>>>> 
> 

Reply via email to