Hi everyone,

It’s been exactly 3.5 years since the first commit to cassandra-accord. Yes, 
really, it’s been that long.

We will be starting to validate the feature against real workloads in the near 
future, so we can’t sensibly push off merging much longer. The following is a 
brief run-down of the state of play. There are no known bugs, but there remain 
a number of caveats we will be incrementally addressing in the run-up to a full 
release:

[1] Accord is likely to be SLOW until further optimisations are implemented
[2] Schema changes have a number of hard edges
[3] Validation is ongoing, so there are likely still a number of bugs to shake 
out
[4] Many operator visibility/tooling/documentation improvements are pending

To expand a little: 

[1] As of the last experiment we conducted, accord’s throughput was poor - also 
leading to higher LAN latencies. We have done no WAN experiments to date, but 
the protocol guarantees should already achieve better round-trip performance, 
in particular under contention. Improving throughput will be the main focus of 
attention once we are satisfied the protocol is otherwise stable, but our focus 
remains validation for the moment.
[2] Schema changes have not yet been well integrated with TCM. Dropping a table 
for instance will currently cause problems if nodes are offline.
[3] We have a range of validations we are already performing against 
cassandra-accord directly, and against its integration with Cassandra in 
cep-15-accord. We have run hundreds of billions of simulated transactions, and 
are still discovering some minor fault every few billion simulated transactions 
or so. There remains a lot more simulated validation to explore, as well as 
with real clusters serving real workloads.
[4] There are already a range of virtual tables for exploring internal state in 
Accord, and reasonably good metric support. However, tracing is not yet 
supported, and our metric and virtual table integrations need some further 
development.
[5] There are also other edge cases to address such as ensuring we do not reuse 
HLCs after restart, supporting ByteOrderPartitioner, and live migration from/to 
Paxos is undergoing fine-tuning and validation; probably there are some other 
things I am forgetting.

Altogether the feature is fairly mature, despite these caveats. This is the 
fruit of the labour of a long list of contributors, including Aleksey 
Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe and 
David Capwell, and represents a huge undertaking. It also wouldn’t have been 
possible without the work of Alex Petrov, Marcus Eriksson and Sam Tunnicliffe 
on delivering transactional cluster metadata. I hope you will join me in 
thanking them all for their contributions.

Alex has also kindly produced some initial overview documentation for 
developers, that can be found here: 
https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
 This will be expanded as time permits.

Does anyone have any questions or concerns?

Reply via email to