Thank you for the update and a BIG thanks to all involved in getting us to this milestone. Looking forward to this work being merged in so we can kick the tires and help surface any issues early.
On Tue, Mar 4, 2025 at 8:01 AM Benedict Elliott Smith <bened...@apache.org> wrote: > Hi everyone, > > It’s been exactly 3.5 years since the first commit to cassandra-accord. > Yes, really, it’s been that long. > > We will be starting to validate the feature against real workloads in the > near future, so we can’t sensibly push off merging much longer. The > following is a brief run-down of the state of play. There are no known > bugs, but there remain a number of caveats we will be incrementally > addressing in the run-up to a full release: > > [1] Accord is likely to be SLOW until further optimisations are implemented > [2] Schema changes have a number of hard edges > [3] Validation is ongoing, so there are likely still a number of bugs to > shake out > [4] Many operator visibility/tooling/documentation improvements are pending > > To expand a little: > > [1] As of the last experiment we conducted, accord’s throughput was poor - > also leading to higher LAN latencies. We have done no WAN experiments to > date, but the protocol guarantees should already achieve better round-trip > performance, in particular under contention. Improving throughput will be > the main focus of attention once we are satisfied the protocol is otherwise > stable, but our focus remains validation for the moment. > [2] Schema changes have not yet been well integrated with TCM. Dropping a > table for instance will currently cause problems if nodes are offline. > [3] We have a range of validations we are already performing against > cassandra-accord directly, and against its integration with Cassandra in > cep-15-accord. We have run hundreds of billions of simulated transactions, > and are still discovering some minor fault every few billion simulated > transactions or so. There remains a lot more simulated validation to > explore, as well as with real clusters serving real workloads. > [4] There are already a range of virtual tables for exploring internal > state in Accord, and reasonably good metric support. However, tracing is > not yet supported, and our metric and virtual table integrations need some > further development. > [5] There are also other edge cases to address such as ensuring we do not > reuse HLCs after restart, supporting ByteOrderPartitioner, and live > migration from/to Paxos is undergoing fine-tuning and validation; probably > there are some other things I am forgetting. > > Altogether the feature is fairly mature, despite these caveats. This is > the fruit of the labour of a long list of contributors, including Aleksey > Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe > and David Capwell, and represents a huge undertaking. It also wouldn’t have > been possible without the work of Alex Petrov, Marcus Eriksson and Sam > Tunnicliffe on delivering transactional cluster metadata. I hope you will > join me in thanking them all for their contributions. > > Alex has also kindly produced some initial overview documentation for > developers, that can be found here: > https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. > This will be expanded as time permits. > > Does anyone have any questions or concerns?