Hi All, I'd like to propose decoupling the java driver as a dependency from the core Cassandra server code.
I also want to propose a path towards eventually migrating test and tools code from Apache Cassandra java driver 3.x to 4.x when the time is right for the project. Refactoring test code to 4.x is likely to be quite invasive, as I count 128 source files utilizing driver code. We'd want to find a good time to do this to minimize disruption to ongoing development. Java driver 4.x is effectively a rewrite of the 3.x driver. Its first release was in March of 2019. While it has similar APIs, it is not binary compatible with the 3.x driver [1]. While there hasn't been a clear decision on how the 3.x driver will be supported going forward (although we should consider discussing this!), we expect and have seen active development take place mostly exclusively on the 4.x driver. It would be useful to migrate to the 4.x driver to test new and future features of which the 4.x driver will actively support. For example, the 4.x driver supports Vector types, where the 3.x driver does not. I've iterated the codebase and identified the following uses of the driver: 0. Core code that uses the driver * UntypedResultSet uses CodecUtils.fromUnsignedToSignedInt from the driver which is just adding Integer.MIN_VALUE to an int so can easily be removed. * PreparedStatementHelper is used only by dtest fuzz tests to validate Prepared Statements. Can be moved to test code. * ThreadAwareSecurityManager.checkPermission makes reference to skipping checking accessDeclaredMembers due to use of CodecUtils, can probably remove that with its use removed. * sstableloader uses the driver to fetch schema and metadata 1. Tools that use the driver * fqltool replay (replaying queries from captured logs) * cassandra-stress (making queries to generate load) 2. Test code * Understandably, quite a bit of test code uses the driver. This is where I anticipate the most work would be be needed. I'd like to propose doing the following: Can be done now: * Move sstableloader source into its own tools directly, much like fqltool and cassandra-stress. For compatibility, we could retain the existing shell script entry point (bin/sstableloader). * Update remaining core code to remove all use of the driver. As shown above, there is not much to change here and this should be relatively easy to accomplish. * Update the build and scripts to establish separate classpaths for the server and the respective tools. We would exclude the driver and its dependencies (that aren't required otherwise) from the server. The driver would still be included in the built package, so this wouldn't reduce the size of the binary, but it would remove the driver from the server's classpath, which would de-risk upgrading the driver and having it or its dependencies cause possible runtime issues. To be done next: * Refactor sstableloader, fqltool and cassandra-stress to use the 4.x driver. To be done when the timing works for the project: * Refactor tests to use the 4.x driver. Hopefully this proposed approach makes sense, I'd be eager to hear any feedback or suggestions! Thanks, Andy [1]: https://docs.datastax.com/en/developer/java-driver/4.17/upgrade_guide/index.html#4-0-0