Thank you to all Accord and TCM contributors, it is really exciting to see a development of such huge and wonderful features moving forward and opening the door to the new Cassandra epoch!
On Tue, 4 Mar 2025 at 20:45, Blake Eggleston <bl...@ultrablake.com> wrote: > Thanks Benedict! > > I’m really excited to see accord reach this milestone, even with these > caveats. You seem to have left yourself off the list of contributors > though, even though you’ve been a central figure in its development :) So > thanks to all accord & tcm contributors, including Benedict, for making > this possible! > > On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote: > > Hi everyone, > > It’s been exactly 3.5 years since the first commit to cassandra-accord. > Yes, really, it’s been that long. > > We will be starting to validate the feature against real workloads in the > near future, so we can’t sensibly push off merging much longer. The > following is a brief run-down of the state of play. There are no known > bugs, but there remain a number of caveats we will be incrementally > addressing in the run-up to a full release: > > [1] Accord is likely to be SLOW until further optimisations are implemented > [2] Schema changes have a number of hard edges > [3] Validation is ongoing, so there are likely still a number of bugs to > shake out > [4] Many operator visibility/tooling/documentation improvements are pending > > To expand a little: > > [1] As of the last experiment we conducted, accord’s throughput was poor - > also leading to higher LAN latencies. We have done no WAN experiments to > date, but the protocol guarantees should already achieve better round-trip > performance, in particular under contention. Improving throughput will be > the main focus of attention once we are satisfied the protocol is otherwise > stable, but our focus remains validation for the moment. > [2] Schema changes have not yet been well integrated with TCM. Dropping a > table for instance will currently cause problems if nodes are offline. > [3] We have a range of validations we are already performing against > cassandra-accord directly, and against its integration with Cassandra in > cep-15-accord. We have run hundreds of billions of simulated transactions, > and are still discovering some minor fault every few billion simulated > transactions or so. There remains a lot more simulated validation to > explore, as well as with real clusters serving real workloads. > [4] There are already a range of virtual tables for exploring internal > state in Accord, and reasonably good metric support. However, tracing is > not yet supported, and our metric and virtual table integrations need some > further development. > [5] There are also other edge cases to address such as ensuring we do not > reuse HLCs after restart, supporting ByteOrderPartitioner, and live > migration from/to Paxos is undergoing fine-tuning and validation; probably > there are some other things I am forgetting. > > Altogether the feature is fairly mature, despite these caveats. This is > the fruit of the labour of a long list of contributors, including Aleksey > Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe > and David Capwell, and represents a huge undertaking. It also wouldn’t have > been possible without the work of Alex Petrov, Marcus Eriksson and Sam > Tunnicliffe on delivering transactional cluster metadata. I hope you will > join me in thanking them all for their contributions. > > Alex has also kindly produced some initial overview documentation for > developers, that can be found here: > https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. > This will be expanded as time permits. > > Does anyone have any questions or concerns? > > > -- Dmitry Konstantinov