Hey Andy, This seems like a reasonable proposal.
We can probably skip cassandra-stress, since it looks like easy-cass-stress can be donated. That does need a driver upgrade to support a vector workload, but imo there's no point in investing more in cassandra-stress when we have an alternative with more features available. Not a hill I'm going to die on, just an opportunity to do less work. Jon On Wed, Feb 12, 2025 at 3:06 PM Tolbert, Andy <x...@andrewtolbert.com> wrote: > Hi All, > > I'd like to propose decoupling the java driver as a dependency from the > core > Cassandra server code. > > I also want to propose a path towards eventually migrating test and tools > code > from Apache Cassandra java driver 3.x to 4.x when the time is right for the > project. > > Refactoring test code to 4.x is likely to be quite invasive, as I count > 128 source files utilizing driver code. We'd want to find a good time to > do > this to minimize disruption to ongoing development. > > Java driver 4.x is effectively a rewrite of the 3.x driver. Its first > release > was in March of 2019. While it has similar APIs, it is not binary > compatible > with the 3.x driver [1]. > > While there hasn't been a clear decision on how the 3.x driver will be > supported going forward (although we should consider discussing this!), we > expect and have seen active development take place mostly exclusively > on the 4.x driver. > > It would be useful to migrate to the 4.x driver to test new and future > features > of which the 4.x driver will actively support. For example, the 4.x driver > supports Vector types, where the 3.x driver does not. > > I've iterated the codebase and identified the following uses of the driver: > > 0. Core code that uses the driver > > * UntypedResultSet uses CodecUtils.fromUnsignedToSignedInt from the driver > which is just adding Integer.MIN_VALUE to an int so can easily be > removed. > * PreparedStatementHelper is used only by dtest fuzz tests to validate > Prepared Statements. Can be moved to test code. > * ThreadAwareSecurityManager.checkPermission makes reference to skipping > checking accessDeclaredMembers due to use of CodecUtils, can probably > remove > that with its use removed. > * sstableloader uses the driver to fetch schema and metadata > > 1. Tools that use the driver > > * fqltool replay (replaying queries from captured logs) > * cassandra-stress (making queries to generate load) > > 2. Test code > > * Understandably, quite a bit of test code uses the driver. This is where I > anticipate the most work would be be needed. > > I'd like to propose doing the following: > > Can be done now: > > * Move sstableloader source into its own tools directly, much like fqltool > and cassandra-stress. For compatibility, we could retain the existing > shell > script entry point (bin/sstableloader). > * Update remaining core code to remove all use of the driver. As shown > above, > there is not much to change here and this should be relatively easy to > accomplish. > * Update the build and scripts to establish separate classpaths for the > server > and the respective tools. We would exclude the driver and its > dependencies > (that aren't required otherwise) from the server. The driver would > still be > included in the built package, so this wouldn't reduce the size of the > binary, but it would remove the driver from the server's classpath, which > would de-risk upgrading the driver and having it or its dependencies > cause > possible runtime issues. > > To be done next: > > * Refactor sstableloader, fqltool and cassandra-stress to use the 4.x > driver. > > To be done when the timing works for the project: > > * Refactor tests to use the 4.x driver. > > Hopefully this proposed approach makes sense, I'd be eager to hear any > feedback or suggestions! > > Thanks, > Andy > > [1]: > https://docs.datastax.com/en/developer/java-driver/4.17/upgrade_guide/index.html#4-0-0 >