David, Before we proceed, I think we should make sure we're all understanding the same problem here. Starting with this:
> I believe the CQL protocol is backwards compatible but the Java API is not. > For example "com.datastax.driver.core.Session" is now > "com.datastax.oss.driver.api.core.session.Session" and there is no more > "Cluster" class. Might be fairly trivial to fix though, if that's the path > of least resistance. >From what I've learned using Cassandra 3 and 4 in my day job and reading up on this stuff for the sake of discussion, that all tracks. We used the ~4.11 driver in Spring Boot on both v3 and v4 clusters without issue during an upgrade. So I don't see any reason to factor in the "changes from DataStax 3 to 4" since the changes were likely a one-off decision meant to position the driver for better future support and stability. TL;DR, we can dump v3 compatibility and the only thing our users will notice is if we make the controller service totally incompatible with the one they're already using which is something we can actively avoid. On Tue, Mar 19, 2024 at 2:00 PM David Handermann < exceptionfact...@apache.org> wrote: > All, > > I support a Controller Service API abstraction around the Cassandra > Driver. The changes from DataStax 3 to 4 already highlight the need > for that abstraction. The donation of the DataStax Java driver to > Apache [1] also shows the value of providing some level of isolation, > if at all possible. > > I have not taken a close look at the Matt's branch, and the details of > the abstraction are important, but having the abstraction can be > useful to avoid getting back to this same situation. > > Regards, > David Handermann > > [1] https://github.com/apache/cassandra-java-driver/ > > On Tue, Mar 19, 2024 at 12:37 PM Mike Thomsen <mikerthom...@gmail.com> > wrote: > > > > Matt, > > > > I got that. My point was that the Java changes appear to be a one time > > thing that DataStax did to make a better driver with a much more > > future-proof API. Since Scylla tracks them as closely as possible, I > > suspect that we don't need to plan for a bunch of abstraction to isolate > > Java changes. > > > > On Tue, Mar 19, 2024 at 11:07 AM Steven Matison < > steven.mati...@gmail.com> > > wrote: > > > > > That was kinda where i got stuck and fell out on my branch/jira. Mike > and > > > I wanted to make a new controller service , without backward > compatibility; > > > and remove the duplicate driver/connection properties found in some of > the > > > processors. > > > > > > I agree taking out all old stuff and making new controller service > makes > > > most sense. 4.x and 5.x should be mostly backwards compatible to 2&3.x > > > with how it’s used within current processors. > > > > > > > > > > > > On Tue, Mar 19, 2024 at 10:49 AM Matt Burgess <mattyb...@apache.org> > > > wrote: > > > > > > > The abstraction is to isolate Java API changes, not protocol > > > compatibility > > > > Changing to the java-driver comes with a number of changes to the > code > > > (see > > > > Steven's and my branches), if we can abstract that API it should > lead to > > > > more maintainable code in the future by not having to change any > > > > processors, just the controller service implementation. > > > > > > > > > > > > On Tue, Mar 19, 2024 at 10:14 AM Mike Thomsen < > mikerthom...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > https://opensource.docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-java-driver.html > > > > > > > > > > Directly quoting Scylla docs here: > > > > > > > > > > > The Scylla Java Driver is a drop-in replacement for the DataStax > Java > > > > > Driver. As such, no code changes are needed to use this driver. > > > > > > > > > > On Tue, Mar 19, 2024 at 10:13 AM Mike Thomsen < > mikerthom...@gmail.com> > > > > > wrote: > > > > > > > > > > > Matt, > > > > > > > > > > > > I don't think we need to really "abstract above" the drivers > because > > > > the > > > > > > Java DataStax driver appears to support 4.X all the way back to > 2.X, > > > as > > > > > > well as the enterprise versions from DataStax > > > > > > > > > > > > > https://docs.datastax.com/en/driver-matrix/docs/java-drivers.html > > > > > > > > > > > > Similar situation with Scylla. When I looked at the driver, it > > > appeared > > > > > to > > > > > > copy verbatim the entire public API of that driver. So I think > before > > > > we > > > > > > dive into abstractions, it's worth doing a bit more validation of > > > these > > > > > > details. IMHO, this might be a much lighter lift than > anticipated. > > > > > > > > > > > > > > > > > > On Mon, Mar 18, 2024 at 4:30 PM Matt Burgess < > mattyb...@gmail.com> > > > > > wrote: > > > > > > > > > > > >> Totally agree, that's what my branch does (see link in previous > > > > email). > > > > > >> The > > > > > >> more I work with it, the more I think I can abstract it further > from > > > > > their > > > > > >> JDBC-like API but I started with a bunch of delegate classes > then I > > > > > figure > > > > > >> I'll see where I can consolidate to more abstract concepts. If I > > > don't > > > > > >> have > > > > > >> to support Cassandra 3 with the new API, so much the better. > > > > > >> > > > > > >> Regards, > > > > > >> Matt > > > > > >> > > > > > >> On Mon, Mar 18, 2024 at 4:14 PM David Handermann < > > > > > >> exceptionfact...@apache.org> wrote: > > > > > >> > > > > > >> > Matt et al, > > > > > >> > > > > > > >> > It is good to see the background effort on moving Cassandra > > > > > >> > capabilities in a supportable direction. > > > > > >> > > > > > > >> > I think new Cassandra components will require a significant > > > > departure > > > > > >> > from current Controller Service abstractions. Right now, the > > > > existing > > > > > >> > service interface does not provide a clean abstraction from > the > > > > > >> > Cassandra library, which is part of the reason for the current > > > > > >> > coupling to the legacy driver version. > > > > > >> > > > > > > >> > Following up from Joe's comments, it seems like the cleanest > way > > > > > >> > forward is to deprecate the current bundle on the 1.x branch, > and > > > > > >> > remove the current bundle from the main branch. That will > provide > > > a > > > > > >> > clean slate for new Service and Processor implementations, > without > > > > > >> > concern for uncertain compatibility questions. > > > > > >> > > > > > > >> > Regards, > > > > > >> > David Handermann > > > > > >> > > > > > > >> > On Mon, Mar 18, 2024 at 2:35 PM Matt Burgess < > > > mattyb...@apache.org> > > > > > >> wrote: > > > > > >> > > > > > > > >> > > What do y'all think about removing the individual connection > > > > > >> properties > > > > > >> > > from the Cassandra processors for NiFi 2.0 and requiring a > > > > > >> > > CassandraSessionProvider instead? I think we started doing > that > > > > > >> elsewhere > > > > > >> > > (Elasticsearch maybe?), I noticed duplicate code in the > > > > > >> > > CassandraSessionProvider and AbstractCassandraProcessor, if > we > > > > keep > > > > > >> those > > > > > >> > > properties I can refactor them into a utility class. > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > Matt > > > > > >> > > > > > > > >> > > > > > > > >> > > On Fri, Mar 15, 2024 at 2:44 PM Steven Matison < > > > > > >> steven.mati...@gmail.com > > > > > >> > > > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > I got through quite a bit of work to enable 4.x… > > > > > >> > > > > > > > > >> > > > The 3.x pieces that were not backwards compatible is very > edge > > > > use > > > > > >> > case and > > > > > >> > > > could have been done slightly differently but with work > > > around. > > > > > >> > > > > > > > > >> > > > https://github.com/steven-matison/nifi/tree/nifi-10120-1 > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Fri, Mar 15, 2024 at 2:30 PM Matt Burgess < > > > > > mattyb...@apache.org> > > > > > >> > wrote: > > > > > >> > > > > > > > > >> > > > > Oops used the wrong email address so if there have been > > > > > responses > > > > > >> to > > > > > >> > the > > > > > >> > > > > Cassandra thread since mine I missed them, my bad! > > > > > >> > > > > > > > > > >> > > > > On Fri, Mar 15, 2024 at 2:00 PM Matt Burgess < > > > > > mattyb...@gmail.com > > > > > >> > > > > > > >> > > > wrote: > > > > > >> > > > > > > > > > >> > > > > > I believe the CQL protocol is backwards compatible > but the > > > > > Java > > > > > >> > API is > > > > > >> > > > > > not. For example "com.datastax.driver.core.Session" > is now > > > > > >> > > > > > "com.datastax.oss.driver.api.core.session.Session" and > > > there > > > > > is > > > > > >> no > > > > > >> > more > > > > > >> > > > > > "Cluster" class. Might be fairly trivial to fix > though, if > > > > > >> that's > > > > > >> > the > > > > > >> > > > > path > > > > > >> > > > > > of least resistance. > > > > > >> > > > > > > > > > > >> > > > > > On Fri, Mar 15, 2024 at 1:40 PM Joe Witt < > > > > joe.w...@gmail.com> > > > > > >> > wrote: > > > > > >> > > > > > > > > > > >> > > > > >> Matt > > > > > >> > > > > >> > > > > > >> > > > > >> I dont know a ton about Cassandra but when I looked > at > > > > > >> > client/driver > > > > > >> > > > > notes > > > > > >> > > > > >> for 4+ it said it was compatible all the way back to > 3.x. > > > > > Not > > > > > >> > sure > > > > > >> > > > > what > > > > > >> > > > > >> that means but it surely seems worth exploring. > Also I > > > > dont > > > > > >> know > > > > > >> > if > > > > > >> > > > the > > > > > >> > > > > >> 4.x drivers get rid of the vulnerable bits. > > > > > >> > > > > >> > > > > > >> > > > > >> Thanks > > > > > >> > > > > >> > > > > > >> > > > > >> On Fri, Mar 15, 2024 at 10:39 AM Matt Burgess < > > > > > >> > mattyb...@apache.org> > > > > > >> > > > > >> wrote: > > > > > >> > > > > >> > > > > > >> > > > > >> > At the very least we should upgrade to Cassandra > > > 3.11.6: > > > > > >> > > > > >> > > > > > > >> > > > > > > > > > >> > > > > > > > https://github.com/apache/cassandra/blob/cassandra-3.11.16/CHANGES.txt > > > > > >> > > > > >> > > > > > > >> > > > > >> > On Fri, Mar 15, 2024 at 1:31 PM Matt Burgess < > > > > > >> > mattyb...@apache.org> > > > > > >> > > > > >> wrote: > > > > > >> > > > > >> > > > > > > >> > > > > >> > > If the community agrees to get rid of Cassandra 3 > > > > that'll > > > > > >> > save me > > > > > >> > > > > >> effort > > > > > >> > > > > >> > > on the refactor after I add Cassandra 4 :) > Otherwise > > > > > those > > > > > >> > > > > >> > > vulnerabilities would only be in a "new" > Cassandra 3 > > > > > >> services > > > > > >> > NAR > > > > > >> > > > > that > > > > > >> > > > > >> > > would not be included in the convenience binary. > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > On Fri, Mar 15, 2024 at 1:28 PM Joe Witt < > > > > > >> joe.w...@gmail.com> > > > > > >> > > > > wrote: > > > > > >> > > > > >> > > > > > > > >> > > > > >> > >> Mike, Matt, > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> Happy to hear you both have active efforts or > are > > > > > >> interested > > > > > >> > in > > > > > >> > > > > doing > > > > > >> > > > > >> > so. > > > > > >> > > > > >> > >> Can you help me understand more specifically > what > > > that > > > > > >> means > > > > > >> > for > > > > > >> > > > > the > > > > > >> > > > > >> > >> current set of components? > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> The CVE hits are concerning and long standing. > > > > > Supporting > > > > > >> > > > > Cassandra > > > > > >> > > > > >> 3 > > > > > >> > > > > >> > >> implies the current set of dependencies would > remain > > > > too > > > > > >> > right? > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> Is the current set of components we have ones we > > > want > > > > to > > > > > >> > retain? > > > > > >> > > > > We > > > > > >> > > > > >> > >> certainly need Cassandra components - but are > the > > > ones > > > > > we > > > > > >> > have > > > > > >> > > > now > > > > > >> > > > > >> the > > > > > >> > > > > >> > >> right ones? > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> Thanks > > > > > >> > > > > >> > >> Joe > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> On Fri, Mar 15, 2024 at 10:25 AM Matt Burgess < > > > > > >> > > > > mattyb...@apache.org> > > > > > >> > > > > >> > >> wrote: > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> > I'm actively working this, I pushed my branch > up > > > in > > > > > case > > > > > >> > anyone > > > > > >> > > > > >> wants > > > > > >> > > > > >> > to > > > > > >> > > > > >> > >> > take a look [1]. The idea is to abstract the > > > > Cassandra > > > > > >> API > > > > > >> > "up > > > > > >> > > > a > > > > > >> > > > > >> > couple > > > > > >> > > > > >> > >> > levels" and provide implementations for > Cassandra > > > 3, > > > > > 4, > > > > > >> and > > > > > >> > > > > >> eventually > > > > > >> > > > > >> > >> 5. > > > > > >> > > > > >> > >> > For JDBC-like interfaces this is a PITA > because of > > > > the > > > > > >> API > > > > > >> > > > > >> (Statement, > > > > > >> > > > > >> > >> > PreparedStatement, BoundStatement, ResultSet, > > > etc.) > > > > > but > > > > > >> I'm > > > > > >> > > > > hoping > > > > > >> > > > > >> we > > > > > >> > > > > >> > >> can > > > > > >> > > > > >> > >> > find a common pattern for abstracting the > > > > third-party > > > > > >> > library > > > > > >> > > > > >> > >> > implementation and API from the NiFi component > > > > > >> (Processor, > > > > > >> > > > > >> > >> > ControllerService, etc.) API. I think we're > doing > > > > > >> something > > > > > >> > > > > similar > > > > > >> > > > > >> > for > > > > > >> > > > > >> > >> > Kafka? > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > Regards, > > > > > >> > > > > >> > >> > Matt > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > [1] > https://github.com/mattyb149/nifi/tree/cassy4 > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > On Fri, Mar 15, 2024 at 8:43 AM Mike Thomsen < > > > > > >> > > > > >> mikerthom...@gmail.com> > > > > > >> > > > > >> > >> > wrote: > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > > That’s been on my todo list for a little > while > > > but > > > > > >> things > > > > > >> > > > kept > > > > > >> > > > > >> > coming > > > > > >> > > > > >> > >> up. > > > > > >> > > > > >> > >> > > I think I could get started on that now. > Based > > > on > > > > my > > > > > >> > initial > > > > > >> > > > > >> > research > > > > > >> > > > > >> > >> it > > > > > >> > > > > >> > >> > > appears that scylla uses the exact same api > as > > > > > >> datastax > > > > > >> > so > > > > > >> > > > > >> > supporting > > > > > >> > > > > >> > >> > both > > > > > >> > > > > >> > >> > > in a cql bundle should theoretically be > fairly > > > > easy. > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > >> > > Sent from my iPhone > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > >> > > > On Mar 14, 2024, at 6:18 PM, Joe Witt < > > > > > >> > joew...@apache.org> > > > > > >> > > > > >> wrote: > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > Team, > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > Cassandra remains a really important > system to > > > > be > > > > > >> able > > > > > >> > to > > > > > >> > > > > send > > > > > >> > > > > >> > data > > > > > >> > > > > >> > >> to. > > > > > >> > > > > >> > >> > > > However, it seems like we've not > maintained > > > > these > > > > > >> > well. We > > > > > >> > > > > >> have > > > > > >> > > > > >> > >> what > > > > > >> > > > > >> > >> > > > appears to be at least a full generation > > > behind > > > > on > > > > > >> > client > > > > > >> > > > > >> versions > > > > > >> > > > > >> > >> (we > > > > > >> > > > > >> > >> > > are > > > > > >> > > > > >> > >> > > > on 3x vs 4x which is the latest stable > with 5x > > > > > >> > apparently > > > > > >> > > > > >> coming > > > > > >> > > > > >> > >> > > shortly). > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > We have components to send data, query > data, > > > and > > > > > use > > > > > >> > > > > Cassandra > > > > > >> > > > > >> as > > > > > >> > > > > >> > a > > > > > >> > > > > >> > >> > cache > > > > > >> > > > > >> > >> > > > store. We have older mechanisms for > json/avro > > > > and > > > > > >> > publish > > > > > >> > > > > >> > >> mechanisms > > > > > >> > > > > >> > >> > for > > > > > >> > > > > >> > >> > > > records. > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > The libraries we do have depend on > outdated > > > > > >> versions of > > > > > >> > > > Guava > > > > > >> > > > > >> and > > > > > >> > > > > >> > >> > result > > > > > >> > > > > >> > >> > > in > > > > > >> > > > > >> > >> > > > many CVE hits. > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > I am inclined to think we should > deprecate the > > > > 1.x > > > > > >> > > > components > > > > > >> > > > > >> and > > > > > >> > > > > >> > >> > remove > > > > > >> > > > > >> > >> > > > them as-is from the 2.x line. Then > > > re-introduce > > > > > >> them > > > > > >> > with > > > > > >> > > > > >> record > > > > > >> > > > > >> > >> only > > > > > >> > > > > >> > >> > > > interfaces and built against the latest > stable > > > > > >> > > > > >> > >> > > Cassandra/Datastax/ScyllaDB > > > > > >> > > > > >> > >> > > > interfaces. > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > I'd love to hear thoughts from those > closer to > > > > > this > > > > > >> > space > > > > > >> > > > > both > > > > > >> > > > > >> as > > > > > >> > > > > >> > a > > > > > >> > > > > >> > >> > user > > > > > >> > > > > >> > >> > > > and developer so we can make good next > steps. > > > > > >> > > > > >> > >> > > > > > > > > >> > > > > >> > >> > > > Thanks > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > >> > > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > >