Hi Josh,

thanks for the status.

I would like to raise awareness that as we fix CASSANDRA-17964, it will 
introduce two tests which will start to fail (because they were not executed as 
part of CI until now because how they are named (not ending on *Test)).

I believe that these tests will need to be addressed and fixed before 4.1 is 
out.

My email describing that in more detail is here (1).

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l

Regards,

Stefan

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l

________________________________________
From: Josh McKenzie <jmcken...@apache.org>
Sent: Monday, November 7, 2022 22:59
To: dev
Subject: Cassandra project status update 2022-11-07

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Oh good grief, it's been 26 days since I wrote one of these. My apologies! 
(Life happens - I can confirm that the terribly named "triple-demic" is real 
folks)

We've had a number of releases since the last status email. The current and 
latest supported GA cassandra releases across all branches are:

- cassandra 4: 4.0.7
- cassandra 3.11: 3.11.14
- cassandra 3.0: 3.0.28


[Needs Committers]
I'd like to first focus our attention on tickets that are flagged as "Needs 
Committer". Our project rules for Cassandra are that 2 committers need to sign 
off on a commit, so many times if an author or reviewer isn't yet a committer, 
these tickets can need external input to get into the codebase. The following 
URL is for a query to pull the Needs Committer tickets: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22

CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could 
use another committer on it: 
https://issues.apache.org/jira/browse/CASSANDRA-17861

CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local 
datacenter could also use another committer on review: 
https://issues.apache.org/jira/browse/CASSANDRA-17870

CASSANDRA-15402, Make incremental backup configurable per keyspace and table 
looks like it has committer attention as per a recent comment so we're good 
there.

CASSANDRA-14930, decommission may cause timeout because messaging backlog is 
cleared: not sure why this one is marked as Needs Committer actually as it has 
2 as reviewer. Might just need a status update.

Before we get to 4.1 status, I'd like to call out that Trie memtables were 
merged in CASSANDRA-17240. This is a large body of novel work (that Branimir 
presented on at ApacheCon for those of you lucky enough to attend) and it's 
great to see this land in the project; it's worth your time to pop open that 
diff and take a look around and see some of the new things being added to 
Cassandra. Notably, there's some great discussion about property-based testing 
going on in the comments which has sparked some offline discussion about how we 
can integrate exploratory fuzz testing in our primary CI pipeline; more to come 
on that front as discussions evolve.


[4.1 status]
Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given 
to understand that the one in progress is close to having something to review, 
so on the "outstanding work" side we're in great shape: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484

That leaves us with the question: what do we do about CI? We've recently 
expanded our governance options as to what we consider validated and cleared 
for release: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. 
Specifically:

"Three consecutive green runs of circleci OR of ASF CI are required to cut RC"

Our most recent run of 4.1 on ASF infra had 9 failures - 
https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1.
 This has been trending up a bit very recently from a low of 1 a bit over a 
week ago; the lion's share of the failures look to be environmental with 
timeouts.

With ASF CI having stragglers that are flaking lately, option 2 would be three 
consecutive green runs on circleci, however in order for this to be 
representative we need some improvements to the test configuration in circle to 
get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: 
https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment 
Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: 
https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's 
anything other folks on the project can do to assist (including reviewing) 
please let us know.

So we do have a 3rd option we discussed in slack: running tests on the ASF 
infra and then selectively multiplexing failures on circle. If a test fails on 
ASF CI but passes 500 times on circle, the general consensus was that was 
sufficient for us to have confidence in the test. With the recent changes 
Andres introduced in CASSANDRA-17939, multiplexing multiple tests in circle has 
become very simple and you can see instructions on generating the correct 
circle config using .circleci/generate.sh --help (look for the REPEATED_UTESTS= 
, REPEATED_JVM_DTESTS=, etc options). This hybrid third approach (canonical run 
on ASF infra + multiplex failures on circle) gives us another outlet to get a 
validated release if necessary, albeit at the cost of more effort.

I'm working with some of the other contributors on ways we can evolve our 
canonical CI infrastructure as well as making that environment reproducible in 
order to get us a more stable environment in the ASF while also allowing 
contributors with access to private cloud hardware to run testing at higher 
parallelization levels; stay tuned for more detail on that in the coming weeks 
as well.

One last note I want to call out - immense amounts of energy from many 
contributors has gone into hardening our test infrastructure and improving our 
tests in the run up to 4.1. 9 tests failing out of a total suite count of 
49,698 tests (as of build 202 on 4.1) is a 99.98% pass rate. That said, we're 
infrastructure software powering many of the world's most critical applications 
so we're going to keep pushing until we hit green and keep it there.


[New Contributors Getting Started]
We have a new entrant for new contributors! So technically this has been around 
awhile but I hadn't thought to promote it in these emails. We have an official 
management sidecar for Apache Cassandra as designed and delivered as part of 
CEP-1: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224. This 
is a smaller and less complex project than the Cassandra Storage engine and 
Query Coordination so might prove an attractive on-ramp for any of you who have 
thought about getting involved but were daunted by the database internals 
themselves.

Open JIRA issues for the sidecar can be found here: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRASC%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20assignee%20DESC%2C%20priority%20DESC%2C%20updated%20DESC

And the project can be cloned from the github repo here: 
https://github.com/apache/cassandra-sidecar

On the Cassandra side, we've curated 24 "Starter Tickets" across our various 
releases that are unassigned right now - these are also good candidates if 
you're looking for something a little more bite-sized to get adjusted: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162.
 Likewise, documentation contributions and website contributions are generally 
good ways to get to know our project ecosystem, the commit process, and 
interact with some of the other contributors.

If you're feeling adventurous, there are quite a few tickets on the unassigned 
list on 4.0.x and 4.x that could be good candidates to take on: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160.
 There's 46 unassigned issues in 4.0.x and 311 in 4.x so there's a lot of 
options to choose from there.

We hang out in #cassandra-dev on https://the-asf.slack.com and there's a 
@cassandra_mentors alias you can use to reach a bunch of us that have 
volunteered to help newcomers get situated. If you need an invite to the slack 
channel feel free to reply to just me on this email and I'll get you set up.

Here's reference explaining the various types of contribution: 
https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: 
https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
The getting started contributing guide: 
https://cassandra.apache.org/_/development/index.html


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=26d:
26 days is a lot of ground to cover here. :)

The thread on whether we were going to do 4.2 or 5.0 came to a close here: 
https://lists.apache.org/thread/ymj3737x25b7bbqv9lp27w5v1ftc83j9. Results are 
enshrined in CASSANDRA-17973: 
https://issues.apache.org/jira/browse/CASSANDRA-17973 (spoiler alert: we're 
going with 5.0)

We had a solid discussion about changes to improve circleci 
(https://lists.apache.org/thread/c7hp1wt06r14v1vpovjd5mzy62gdsxqh) that 
culminated in CASSANDRA-18007 being created: 
https://issues.apache.org/jira/browse/CASSANDRA-18007.

Erick Ramirez provided a PR and proposal for a formal events page for our 
website: https://lists.apache.org/thread/hn1b8ymn5sq3w31dvrorroqm2q7yw82v, that 
can be seen here now that it's merged: 
https://cassandra.apache.org/_/events.html

Derek Chen-Becker had a general question about our usage of sh vs. bash: 
https://lists.apache.org/thread/dzn34v18rhgsxo9grlmxrvxnp0521hgz. The quick and 
dirty lazy consensus there seems to be "user-facing don't change from sh, 
dev-facing let's go bash".

Derek has a well thought out and articulated proposal about refactoring and 
cleaning up our CircleCI config to make use of some of the idiomatic features 
and parameterization available in the ecosystem: 
https://lists.apache.org/thread/mvql1p5y2j7so18427zcg4zxc9vzl7l3.

We've had some tests slip through the cracks historically as they didn't match 
the prescribed regex that picks up test file names; Stefan Miklosovic called 
this out on a thread here: 
https://lists.apache.org/thread/vhqprqcv070vmomoozyqdn75fvdd1oll. There's a 
couple of proposals that have come up on the thread (that are ultimately 
complementary) - using Checkstyle to force a certain file format and extending 
our logic during our build to look for non-abstract files in the test directory 
containing the @Test annotation. No real closure on this yet, and ultimately 
the person willing to do the work has the final say on it if nobody has any 
major concerns with an implementation which is the case here.

A few days ago David Capwell asked about places in our code where we haven't 
actually specified encoding meaning they've relied on the system specified 
default: https://lists.apache.org/thread/sokxf46s7hyoxr9q4wm7dv3q2nm19nt3. I've 
personally read that email three times now and can't think of a useful response 
other than to back away slowly, so maybe one of you will see that here and 
chime in. :)

And last but not least in this marathon catch-up, Ekaterina has put together a 
proposal for extending our code style regarding when we access JDK internals 
and when to hit the dev list for consensus on this thread: 
https://lists.apache.org/thread/ydgg308jl6sfcwg92kf6m7ylqqo089ho. Her proposal 
can be found here: 
https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf


[ASF CI Trends]
https://butler.cassandra.apache.org/#/

Here's our trends on our branches for the last 26 days:

3.0: 13 -> 10
3.11: 22 -> 11
4.0: 6 -> 2
4.1: 14 -> 9
trunk: 7 -> 21

We discussed 4.1 up above; 3.0 through 4.0 are trending in a good direction. 
Looks like quite a few of the trunk failures are from new messaging in logs on 
teardown that either don't have exceptions yet in the teardown parser or test 
that haven't been updated to change logic to match new defaults on trunk. I'd 
advocate for all those things being fixed _before_ they get into trunk of 
course, but I'm also responsible for some of them so I will refrain from 
throwing stones from within this fine glass house I'm in.


[Release progress]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2278

I know I'm behind when I have to create a custom quick filter to have the 
kanban show some strange number of days. So in the last 26 days we have:

4.1 rc / ga: 7 issues
- Fixing generate.sh behavior w/out options provided (CASSANDRA-17995)
- Fix race condition on repair snapshots (CASSANDRA-17955)
- Add --resolve-ip option on nodetool gossipinfo (CASSANDRA-17934)
- Automatically detect and repeat new or changed tests in circleci config 
(CASSANDRA-17939)
- Update What's New page for 4.1 and trunk (CASSANDRA-17976)
- Update Netbeans project file for dependency (CASSANDRA-18002)
- RPM installation on centos7 is broken (CASSANDRA-17765)

4.0.x: 5 issues
- CircleCI: j11_utest_fqltool fails to build (CASSANDRA-18020)
- CircleCI: Skip checkstyle in the ant-based repeated tests (CASSANDRA-18000)
- Fix CircleCI config for running python upgrade tests on 3.0 and 3.11 
(CASSANDRA-17912)
- Update debian packages for bullseye (CASSANDRA-17871)
- CircleCI: Add jobs for running specialized unit tests with Java 11 
(CASSANDRA-17987)

4.X / Next: 6 issues
- Round out cqlsh completion test coverage (CASSANDRA-16640)
- Log JVM arguments at in-JVM test class initialization (CASSANDRA-16664)
- nodetool bootstrap resume returns success even if there is an error during 
bootstrap (CASSANDRA-16491)
- Make resumable bootstrap feature optional (CASSANDRA-17679)
- Include GitSHA in nodetool version output (CASSANDRA-17753)
- CEP-19: Trie Memtable implementation (CASSANDRA-17240)

Phew! And this is why I should keep to the biweekly cadence; there's a lot 
going on these days. :)

~Josh

Reply via email to