Hey Andy,

This seems like a reasonable proposal.

We can probably skip cassandra-stress, since it looks like easy-cass-stress
can be donated.  That does need a driver upgrade to support a vector
workload, but imo there's no point in investing more in cassandra-stress
when we have an alternative with more features available.  Not a hill I'm
going to die on, just an opportunity to do less work.

Jon




On Wed, Feb 12, 2025 at 3:06 PM Tolbert, Andy <x...@andrewtolbert.com> wrote:

> Hi All,
>
> I'd like to propose decoupling the java driver as a dependency from the
> core
> Cassandra server code.
>
> I also want to propose a path towards eventually migrating test and tools
> code
> from Apache Cassandra java driver 3.x to 4.x when the time is right for the
> project.
>
> Refactoring test code to 4.x is likely to be quite invasive, as I count
> 128 source files utilizing driver code.  We'd want to find a good time to
> do
> this to minimize disruption to ongoing development.
>
> Java driver 4.x is effectively a rewrite of the 3.x driver.  Its first
> release
> was in March of 2019. While it has similar APIs, it is not binary
> compatible
> with the 3.x driver [1].
>
> While there hasn't been a clear decision on how the 3.x driver will be
> supported going forward (although we should consider discussing this!), we
> expect and have seen active development take place mostly exclusively
> on the 4.x driver.
>
> It would be useful to migrate to the 4.x driver to test new and future
> features
> of which the 4.x driver will actively support.  For example, the 4.x driver
> supports Vector types, where the 3.x driver does not.
>
> I've iterated the codebase and identified the following uses of the driver:
>
> 0. Core code that uses the driver
>
> * UntypedResultSet uses CodecUtils.fromUnsignedToSignedInt from the driver
>   which is just adding Integer.MIN_VALUE to an int so can easily be
> removed.
> * PreparedStatementHelper is used only by dtest fuzz tests to validate
>   Prepared Statements.  Can be moved to test code.
> * ThreadAwareSecurityManager.checkPermission makes reference to skipping
>   checking accessDeclaredMembers due to use of CodecUtils, can probably
> remove
>   that with its use removed.
> * sstableloader uses the driver to fetch schema and metadata
>
> 1. Tools that use the driver
>
> * fqltool replay (replaying queries from captured logs)
> * cassandra-stress (making queries to generate load)
>
> 2. Test code
>
> * Understandably, quite a bit of test code uses the driver. This is where I
>   anticipate the most work would be be needed.
>
> I'd like to propose doing the following:
>
> Can be done now:
>
> * Move sstableloader source into its own tools directly, much like fqltool
>   and cassandra-stress.  For compatibility, we could retain the existing
> shell
>   script entry point (bin/sstableloader).
> * Update remaining core code to remove all use of the driver.  As shown
> above,
>   there is not much to change here and this should be relatively easy to
>   accomplish.
> * Update the build and scripts to establish separate classpaths for the
> server
>   and the respective tools.  We would exclude the driver and its
> dependencies
>   (that aren't required otherwise) from the server.  The driver would
> still be
>   included in the built package, so this wouldn't reduce the size of the
>   binary, but it would remove the driver from the server's classpath, which
>   would de-risk upgrading the driver and having it or its dependencies
> cause
>   possible runtime issues.
>
> To be done next:
>
> * Refactor sstableloader, fqltool and cassandra-stress to use the 4.x
> driver.
>
> To be done when the timing works for the project:
>
> * Refactor tests to use the 4.x driver.
>
> Hopefully this proposed approach makes sense, I'd be eager to hear any
> feedback or suggestions!
>
> Thanks,
> Andy
>
> [1]:
> https://docs.datastax.com/en/developer/java-driver/4.17/upgrade_guide/index.html#4-0-0
>

Reply via email to