Re: CEP-15 Update

Benedict Elliott Smith Wed, 05 Mar 2025 01:22:23 -0800

Thanks everyone. 

Jon - your help will be greatly appreciated. We’ll let you know when we’ve got 
the cycles to invest in performance work (hopefully fairly soon). I expect the 
first step will be improving visibility so we can better understand what the 
system is doing (particularly the caching layers), but we can dig in together 
when ready.


> On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com> wrote:
> 
> Very exciting!  
> 
> I have a client that's very interested in Accord, so I should have budget to 
> dig into it, especially on the performance side of things.
> 
> Jon
> 
> On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov <netud...@gmail.com 
> <mailto:netud...@gmail.com>> wrote:
>> Thank you to all Accord and TCM contributors, it is really exciting to see a 
>> development of such huge and wonderful features moving forward and opening 
>> the door to the new Cassandra epoch!
>> 
>> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston <bl...@ultrablake.com 
>> <mailto:bl...@ultrablake.com>> wrote:
>>> Thanks Benedict!
>>> 
>>> I’m really excited to see accord reach this milestone, even with these 
>>> caveats. You seem to have left yourself off the list of contributors 
>>> though, even though you’ve been a central figure in its development :) So 
>>> thanks to all accord & tcm contributors, including Benedict, for making 
>>> this possible!
>>> 
>>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>>> Hi everyone,
>>>> 
>>>> It’s been exactly 3.5 years since the first commit to cassandra-accord. 
>>>> Yes, really, it’s been that long.
>>>> 
>>>> We will be starting to validate the feature against real workloads in the 
>>>> near future, so we can’t sensibly push off merging much longer. The 
>>>> following is a brief run-down of the state of play. There are no known 
>>>> bugs, but there remain a number of caveats we will be incrementally 
>>>> addressing in the run-up to a full release:
>>>> 
>>>> [1] Accord is likely to be SLOW until further optimisations are implemented
>>>> [2] Schema changes have a number of hard edges
>>>> [3] Validation is ongoing, so there are likely still a number of bugs to 
>>>> shake out
>>>> [4] Many operator visibility/tooling/documentation improvements are pending
>>>> 
>>>> To expand a little: 
>>>> 
>>>> [1] As of the last experiment we conducted, accord’s throughput was poor - 
>>>> also leading to higher LAN latencies. We have done no WAN experiments to 
>>>> date, but the protocol guarantees should already achieve better round-trip 
>>>> performance, in particular under contention. Improving throughput will be 
>>>> the main focus of attention once we are satisfied the protocol is 
>>>> otherwise stable, but our focus remains validation for the moment.
>>>> [2] Schema changes have not yet been well integrated with TCM. Dropping a 
>>>> table for instance will currently cause problems if nodes are offline.
>>>> [3] We have a range of validations we are already performing against 
>>>> cassandra-accord directly, and against its integration with Cassandra in 
>>>> cep-15-accord. We have run hundreds of billions of simulated transactions, 
>>>> and are still discovering some minor fault every few billion simulated 
>>>> transactions or so. There remains a lot more simulated validation to 
>>>> explore, as well as with real clusters serving real workloads.
>>>> [4] There are already a range of virtual tables for exploring internal 
>>>> state in Accord, and reasonably good metric support. However, tracing is 
>>>> not yet supported, and our metric and virtual table integrations need some 
>>>> further development.
>>>> [5] There are also other edge cases to address such as ensuring we do not 
>>>> reuse HLCs after restart, supporting ByteOrderPartitioner, and live 
>>>> migration from/to Paxos is undergoing fine-tuning and validation; probably 
>>>> there are some other things I am forgetting.
>>>> 
>>>> Altogether the feature is fairly mature, despite these caveats. This is 
>>>> the fruit of the labour of a long list of contributors, including Aleksey 
>>>> Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe 
>>>> and David Capwell, and represents a huge undertaking. It also wouldn’t 
>>>> have been possible without the work of Alex Petrov, Marcus Eriksson and 
>>>> Sam Tunnicliffe on delivering transactional cluster metadata. I hope you 
>>>> will join me in thanking them all for their contributions.
>>>> 
>>>> Alex has also kindly produced some initial overview documentation for 
>>>> developers, that can be found here: 
>>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
>>>>  This will be expanded as time permits.
>>>> 
>>>> Does anyone have any questions or concerns?
>>> 
>> 
>> 
>> 
>> --
>> Dmitry Konstantinov

Re: CEP-15 Update

Reply via email to