Great to see such a significant progress made in the area!

On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko <> wrote:

> I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> and stability improvements to internode messaging code that Benedict, I,
> and others have been working on for the past couple of months.
> First, some context.   This work started off as a review of CASSANDRA-14503
> (Internode connection management is race-prone [2]), CASSANDRA-13630
> (Support large internode messages with netty) [3], and a pre-4.0
> confirmatory review of such a major new feature.
> However, as we dug in, we realized this was insufficient. With more than 50
> bugs uncovered [4] - dozens of them critical to correctness and/or
> stability of the system - a substantial rework was necessary to guarantee a
> solid internode messaging subsystem for the 4.0 release.
> In addition to addressing all of the uncovered bugs [4] that were unique to
> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> long-existing, pre-4.0 bugs and stability issues. For the complete list of
> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> to highlight a few.
> # Lack of message integrity checks
> It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> relied upon [6] for integrity. With sufficient scale or time, you will hit
> bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> replication model makes this issue much more serious, as the faulty data
> can infect the cluster.
> We recognised this problem, and recently introduced a fix for server-client
> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> native protocol) [7].
> But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> internode. We have addressed it by ensuring that no matter what, whether
> you use SSL or not, whether you use internode compression or not, a
> protocol level CRC is always present, for every message frame. It’s our
> deep and sincere belief that shipping a new implementation of the messaging
> protocol without application-level data integrity checks would be
> unacceptable for a modern database.

I'm all for introducing more correctness checks at all levels especially in
Having dealt with multiple data corruption bugs that could have been easily
prevented by
having a checksum, it's great to see that we are moving in this direction.

> # Lack of back-pressure and memory usage constraints
> As it stands today, it’s far too easy for a single slow node to become
> overwhelmed by messages from its peers.  Conversely, multiple coordinators
> can be made unstable by the backlog of messages to deliver to just one
> struggling node.
> To address this problem, we have introduced strict memory usage constraints
> that apply TCP-level back-pressure, on both outbound and inbound.  It is
> now impossible for a node to be swamped on inbound, and on outbound it is
> made significantly harder to overcommit resources.  It’s a simple, reliable
> mechanism that drastically improves cluster stability under load, and
> especially overload.
> Cassandra is a mature system, and introducing an entirely new messaging
> implementation without resolving this fundamental stability issue is
> difficult to justify in our view.

I'd say that this is required to be able to ship 4.0 as a release focused
on stability.
I personally have been waiting for this to happen for years. Significant
step forward in our QoS story.

> # State of the patch, feature freeze and 4.0 timeline concerns
> The patch is essentially complete, with much improved unit tests all
> passing, dtests green, and extensive fuzz testing underway - with initial
> results all positive.  We intend to further improve in-code documentation
> and test coverage in the next week or two, and do some minor additional
> code review, but we believe it will be basically good to commit in a couple
> weeks.
> The patch could also use some performance testing. We are hoping that our
> colleagues at Netflix and new Cassandra committers Joey and Vinay will help
> us with this aspect.  However, this work needs to be done regardless, so
> provides no significant delay.
> I would classify absolutely most of the work done in this patch as
> necessary bug fixes and stability improvements - in line with the stated
> goals of the feature freeze.
> The only clear “feature” introduced is the expanded metrics, and virtual
> tables to observe them.  If the freeze is to be strictly interpreted these
> can be removed, but we think this would be unwise.
> We hope that the community will appreciate and welcome these changes.
> We’ve worked hard to deliver this promptly, to minimise delays to 4.0 were
> these issues to be addressed piecemeal, and we sincerely believe this work
> will have a positive impact on stability and performance of Cassandra
> clusters for years to come.
> P.S. I believe that once it’s committed, we should cut our first 4.0 alpha.
> It’s about time we started this train (:
> [1]
> [2]
> [3]
> [4]
> (incomplete list)
> [5]
> [6]
> [7]

Reply via email to