Thanks for reaching out to the list Radim; this is interesting stuff. >From skimming the PR on the Spring side and the conversation there, it looks >like the argument is to have this live inside the java driver for Cassandra >instead of in the spring-boot lib which I can see the argument for.
If we distill this to speak to precisely the problem we're trying to address or improvement we're going for here, how would you phrase that? i.e. "Take application startup from Nms down to Mms"? I ask because that's the "pro" we'll need to weigh against updating the driver's topology map of the cluster, resource handling and potential leaks on shutdown/startup, and the complexity of taking an implementation like this into the driver code. Nothing insurmountable of course, just worth weighing the two. On Thu, Mar 6, 2025, at 3:34 PM, Radim Vansa wrote: > Hi all, > > I would like to make applications using Cassandra Java Driver, > particularly those built with Spring Boot, Quarkus or similar > frameworks, work with OpenJDK CRaC project [1]. I've already created a > patch for Spring Boot [2] but Spring folks think that these changes are > too dependent on driver internals, suggesting to contribute a support to > Cassandra directly. > > The patch involves closing all connections before checkpoint, and > re-establishing these after restore. I have implemented that though > sending a `NodeStateEvent -> FORCED_DOWN` on the bus for all connected > nodes. As a follow-up I could develop some way to inform the session > about a new topology e.g. if the cluster addresses change. > > Before jumping onto implementing a PR I would like to ask what you think > is the best approach to do this. I can think of two ways: > > 1) Native CRaC support > > The driver would have a dependency on `org.crac:crac` [3]; this is a > small (13kB) library that provides the interfaces and a dummy noop > implementation if the target JVM does not support CRaC. Then > `DefaultSession` would register a `org.crac.Resource` implementation > that would handle the checkpoint. This has the advantage of providing > best fan-out into any project consuming the driver without any further work. > > 2) Exposing neutral methods > > To save frameworks of relying on internals, `DefaultSession` would > expose `.suspend()` and `.resume()` methods that would implement the > connection cut-off without importing any dependency. After upgrade to > latest release, frameworks could use these methods in a way that suits > them. I wouldn't add those methods to the `CqlSession` interface (as > that would be breaking change) but only to `DefaultSession`. > > Would Cassandra accept either of these, to let people checkpoint > (snapshot) their applications and restore them within tens of > milliseconds? Naturally it is possible to close the session object > completely and create a new one, but the ideal solution would require no > application changes beyond dependency upgrade. > > Btw. I am aware that there is an inherent race between possible topology > change and shutdown of current nodes (and I am listening for hints that > would let us prevent that), but it is reasonable to expect that users > will checkpoint the application in a quiescent state. And if the > topology update breaks the checkpoint, it is always possible to try it > again. > > Thank you for your opinions and ideas! > > Radim Vansa > > > [1] https://wiki.openjdk.org/display/crac > > [2] https://github.com/spring-projects/spring-boot/pull/44505 > > [3] https://mvnrepository.com/artifact/org.crac/crac/1.5.0 > >