I got some feedback last week that I should try this on Monday morning, so let's see if we can nudge a few people into action this week.
3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a surprise to anyone here - you should have seen the votes and release notifications. The people working directly ON Cassandra every day are probably very aware of the number and nature of fixes in those versions - if you're not aware, the Change lists are HUGE, and some of the fixes are VERY IMPORTANT. So this week's wrap-up is really a reflection on the size of those two release changelogs. One of the advantages of the Cassandra project is the size of the user base - I don't know if we have accurate counts (and some of the "surveys" are laughable), but we know it's on the order of thousands (probably tens of thousands) of companies, and some huge number of instances (not willing to speculate here, we know it's at least in the hundreds of thousands, may be well into the millions). Historically, the best stabilizer of a release was people upgrading their unusual use cases, finding bugs that the developers hadn't anticipated (and therefore tests didn't exist for those edge cases), reporting them, and the next release would be slightly better than the one before it. The chicken/egg problem here is pretty obvious, and while a lot of us are spending a lot of time making things better, I want to use this email to ask a favor (in 3 parts): 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a thousand node cluster, most of the weird stuff we've seen in the post-3.0 world deals with data, not cluster size. Grab some of your prod data if you can, throw it into a test cluster, add a node/remove a node, tell us if it doesn't work. 2) Please run a stress workload against that test cluster, even if it's 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some edge cases we haven't seen before, but it'll also help us identify holes in stress coverage. We have some tickets to add UDTs to stress ( https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT ( https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your stress profile should be more than "80% reads 20% writes" - try to actually model your schema and query behavior. Do you use static columns? Do you use collections? If you're unable to model your use case because of a deficiency in stress, open a JIRA. If things break, open a JIRA. If it works perfectly, I'm interested in seeing your stress yaml and results (please send it to me privately, don't spam the list). 3) If you're somehow not able to run stress because you don't have hardware for a spare cluster, profiling your live cluster is also incredibly useful. TLP has some notes on how to generate flame graphs - https://github.com/thelastpickle/lightweight-java-profiler - I saw one example from a cluster that really surprised me. There are versions and use cases that we know have been heavily profiled, but there are probably versions and use cases where nobody's ever run much in the way of profiling. If you're running openjdk in prod, and you're able to SAFELY attach a profiler to generate some flame graphs, please send those to me (again, privately please, I don't think the whole list needs a copy). My hope in all of this is to build up a corpus of real world use cases (and real current state via profiling) that we can leverage to make testing and performance better going forward. If I get much in the way of response to either of these, I'll try to send out a summary in next week's email). - Jeff