Re: [VOTE] CEP-49: Hardware-accelerated compression

Štefan Miklošovič Tue, 06 Jan 2026 06:25:26 -0800

Something like that, yes. But this "decision logic" should be done in
Cassandra. A plugin is just that ... a plugin. If Cassandra evaluates
that it was unable to plug it then we will go with an in-built
version.


We did something similar when helping with delivery of (1). From (2)
you see how the application of a plugin is done. So there is the place
we _fallback_. It is an implementation detail how we will react if it
is not pluggable. Here we chose to just fallback to in-build
compression, in case of crypto providers we chose to fail the startup.

You may also look at how it is done here (3) and in the classes added
in that package.

I know that you tried to be minimalistic in your implementation in
order to not introduce anything new to cassandra.yaml. I think that
this approach might be re-evaluated, even after the vote. I would
welcome consistency around the logic when it comes to "plugins".

(1) https://issues.apache.org/jira/browse/CASSANDRA-18624
(2) 
https://github.com/apache/cassandra/commit/6ab45971fc651f78c8748f80e3cd6d4a1b6dbc50#diff-054af65b8d690b0fddc3e0a4ef05a80d8f1d6689b4f77912795fec019200666c
(3) 
https://github.com/apache/cassandra/commit/6ab45971fc651f78c8748f80e3cd6d4a1b6dbc50#diff-f6c790ffcab013d2fa520f17e7f11ecad12719ec561b5e73d6905e6ea903b7ec

On Mon, Jan 5, 2026 at 6:04 PM Kokoori, Shylaja
<[email protected]> wrote:
>
> Happy New Year everyone!
>
> Joey & Stefan,
> I apologize for the delayed response—I was away without email access.
> I believe I understand your point. We want to check for hardware acceleration 
> availability when creating a compressor. If the platform running Cassandra 
> supports hardware-accelerated compression, we utilize the accelerated version 
> and log any errors that occur. Otherwise, we fall back to the default 
> compression method.
> I have a proof-of-concept plugin available here. I'm assuming you're 
> referring to a function similar to the one linked below—is this correct?
> https://github.com/intel/qat-plugin-cassandra/blob/main/src/main/java/com/intel/qat/compression/deflate/QatDeflateCompressorFactory.java#L41
>
> Thank you,
> Shylaja
>
> ________________________________
> From: Joseph Lynch <[email protected]>
> Sent: Wednesday, December 17, 2025 7:17 AM
> To: [email protected] <[email protected]>
> Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
>
> Completely agree. Shylaja does that make sense to you and do you think the 
> QAT implementation can do the probing at construction/provider time in the 
> appropriate factory (basically a provider interface similar to how the JVM 
> does crypto providers)?
>
> So when we create the compressor instance, select QAT if available and 
> functional (and if you need to e.g. try something and catch an exception to 
> detect that, that's fine), then we can test various cases like "what if on 
> arm, what if on x86 not intel, what if on x86 old intel that doesn't support 
> Zstd but does support deflate" etc etc...?
>
> -Joey
>
> On Wed, Dec 17, 2025 at 10:05 AM Štefan Miklošovič <[email protected]> 
> wrote:
>
> You are absolutely spot on. This is what I was trying to explain, that
> we are wrapping it in try-catch and "act as if nothing happened" when
> fallbacking to the default one we have. That is also the reason why I
> did not want to see _this kind of fallback_. There is nothing wrong
> with picking the compressor at the beginning upon startup as you
> suggested and working with one implementation of that only. And if it
> is meant to fail so let it. What I am against is this "dynamic
> switching" in the very de/compression methods.
>
> On Wed, Dec 17, 2025 at 3:57 PM Joseph Lynch <[email protected]> wrote:
> >
> > Does QAT not provide a way to detect what the hardware supports and return 
> > that capability at construction time so we can pick the fastest 
> > implementation that the hardware supports? That seems like a more robust 
> > way than inline exception handling and consistent with how we do the other 
> > native fallbacks, where we probe if they are available and fallback to a 
> > different instance entirely if not available.
> >
> > Agree the inline try-catch is inelegant and implies that sometimes QAT can 
> > succeed and sometimes it can fail. That should not be the case (either 
> > hardware acceleration exists or not).
> >
> > -Joey
> >
> > On Wed, Dec 17, 2025 at 9:31 AM Štefan Miklošovič <[email protected]> 
> > wrote:
> >>
> >> To be explicit, we are talking about this kind of fallbacking:
> >>
> >> https://gist.githubusercontent.com/smiklosovic/8efcdefadae0b6aae5c7eedd6cc948f7/raw/ae5716d077c1a37b4db901f81620f09d957dd303/gistfile1.txt
> >>
> >> I made a gist from that on PR in case it gets updated / overwritten.
> >>
> >> The logic here is that the "QAT backed compressor" is used first and
> >> when it fails we fallback to the one we have in Cassandra. I have not
> >> found the implementation of that plugin, it is said to be added later
> >> on.
> >>
> >> So it is not about "we start Cassandra and then based on what is
> >> available we pick to de/compress with and if QAT is not available we
> >> fallback to stuff we have already". It is more about "we will put this
> >> plugin on class path, that will effectively override the compressor we
> >> are using AND IF THAT FAILS then while we are de/compressing we
> >> fallback to the default one we have".
> >>
> >> Do you see the slight difference in the semantics here when talking
> >> about "fallbacking"?
> >>
> >>
> >> On Wed, Dec 17, 2025 at 3:14 PM Joseph Lynch <[email protected]> wrote:
> >> >
> >> > Just noticed the discussion here, I think this is just another case of 
> >> > "native" code like we've done in the past. We try to load the native 
> >> > library (try to load up QAT), if that fails then we try finding the 
> >> > fastest implementation that works on the hardware they have. If you're 
> >> > running on say arm we are already falling back to pure java 
> >> > implementations of many things for example (afaik we only have native 
> >> > implementations for crypto, compression and hashing on x86, but I might 
> >> > have missed the arm patches).
> >> >
> >> > So instead of say x86 native -> fast java (unsafe) -> slow java it would 
> >> > be qat -> x86 native -> slow java (since afaik we don't want to use 
> >> > unsafe anymore). A log line helps the operator know _which_ of these 
> >> > they've ended up with so they can debug why they are spending so many 
> >> > cycles where they are, but I don't think the fallback is intrinsically 
> >> > hazardous (we already do transparent fallbacks for TLS, Compression and 
> >> > Hashing afaik).
> >> >
> >> > -Joey
> >> >
> >> > On Wed, Dec 17, 2025 at 1:53 AM Štefan Miklošovič 
> >> > <[email protected]> wrote:
> >> >>
> >> >> As mentioned, some combination of logging + metrics + maybe dying or
> >> >> something else?
> >> >>
> >> >> I don't know for now, too soon / specific to deal with that, but
> >> >> _something_ should be done, heh. I do not want to block otherwise
> >> >> helpful and valuable contributions on these technicalities, but they
> >> >> should be addressed.
> >> >>
> >> >> The "interesting" aspect of this acceleration hardware is that if it
> >> >> is baked into the CPU and that fails, what are we actually supposed to
> >> >> do with it? I do not know the details too much here but if it
> >> >> hypothetically failed then we are supposed to do what, replace CPU?
> >> >> Does a failure mean that the hardware as such is broken or the failure
> >> >> was just intermittent? If a disk fails we can replace it and restart
> >> >> the machine and rebuild or whatever, or we can just replace the whole
> >> >> node.
> >> >>
> >> >> Anyway, we can always think about that more in follow-up tickets after
> >> >> the initial delivery, but logging in a non-spamming manner + metrics
> >> >> would be the minimum here imho.
> >> >>
> >> >> On Wed, Dec 17, 2025 at 1:27 AM Josh McKenzie <[email protected]> 
> >> >> wrote:
> >> >> >
> >> >> > What if we went the same route we do for disk failure, have a sane 
> >> >> > default we collectively believe to be the "majority case", but also 
> >> >> > have a configuration knob in cassandra.yaml to choose a hard stop on 
> >> >> > failure if so inclined? Complexity is low, maintenance burden should 
> >> >> > be low.
> >> >> >
> >> >> > These discussions end up spinning trying to find the One Right Answer 
> >> >> > when there isn't one. You're right Stefan. And so is Scott. It 
> >> >> > depends. :)
> >> >> >
> >> >> > On Tue, Dec 16, 2025, at 2:11 PM, Štefan Miklošovič wrote:
> >> >> >
> >> >> > In the scenarios as Scott described it does make sense to fallback but
> >> >> > I am not sure about that when there is a production traffic happening
> >> >> > and we rely on hardware de/compression and _that_ fails silently.
> >> >> >
> >> >> > It is one thing to not fail catastrophically when upgrading or
> >> >> > changing nodes or machines with that hardware are not present etc. and
> >> >> > it is something different to actually expect that data will be
> >> >> > de/compressed with some acceleration and we just swallow the exception
> >> >> > and de/compress in software.
> >> >> >
> >> >> > My perception here is that Cassandra is embracing the philosophy that
> >> >> > if it fails so let it and change the hardware. Heck, we have whole
> >> >> > class of logic around what should happen if there is some kind of a
> >> >> > disk failure.
> >> >> >
> >> >> > While here we are going to act as when the very hardware I am supposed
> >> >> > to de/compress with fails to do so I just fallback to software and ...
> >> >> > that's it? Should not there be some kind of a mechanism to also die
> >> >> > when something goes wrong here?
> >> >> >
> >> >> > On Tue, Dec 16, 2025 at 7:10 PM Josh McKenzie <[email protected]> 
> >> >> > wrote:
> >> >> > >
> >> >> > > As a user, I'd rather have a WARN in my logs than to be unable to 
> >> >> > > start the database without changing cluster-wide configuration like 
> >> >> > > schema / compaction parameters.
> >> >> > >
> >> >> > > Strong +1 here.
> >> >> > >
> >> >> > > While on the one hand we expect homogenous hardware environments 
> >> >> > > for clusters, to Scott's point that's not always going to hold true 
> >> >> > > in containerized and cloud-based environments. Definitely think we 
> >> >> > > need to let the operators know, but graceful degradation of the 
> >> >> > > database (in a step-wise plateau-based fashion like this, not a 
> >> >> > > death spiral scenario to be clear) is much preferred IMO.
> >> >> > >
> >> >> > > On Tue, Dec 16, 2025, at 10:32 AM, Štefan Miklošovič wrote:
> >> >> > >
> >> >> > > Okay I guess that is a good compromise to make here. So warning in 
> >> >> > > the
> >> >> > > logs + metrics? I think that metrics would be cool to have so we 
> >> >> > > might
> >> >> > > chart how often it happens etc.
> >> >> > >
> >> >> > > On Tue, Dec 16, 2025 at 4:27 PM C. Scott Andreas 
> >> >> > > <[email protected]> wrote:
> >> >> > > >
> >> >> > > > One example where lack of a fallback would be problematic is:
> >> >> > > >
> >> >> > > > – User provisions AWS metal-class instances that expose hardware 
> >> >> > > > QAT and adopts.
> >> >> > > > – User needs to expand cluster or replace failed hardware.
> >> >> > > > – Insufficient hardware-QAT-capable machines available from AWS
> >> >> > > > – Cassandra unable to start on replacement/expanded machines due 
> >> >> > > > to lack of fallback.
> >> >> > > >
> >> >> > > > There are a handful of cases where the database performs similar 
> >> >> > > > fallbacks today, such as attempting mlockall on startup for 
> >> >> > > > improved memory locality and to avoid allocation stalls.
> >> >> > > >
> >> >> > > > As a user, I'd rather have a WARN in my logs than to be unable to 
> >> >> > > > start the database without changing cluster-wide configuration 
> >> >> > > > like schema / compaction parameters.
> >> >> > > >
> >> >> > > > – Scott
> >> >> > > >
> >> >> > > > On Dec 16, 2025, at 5:18 AM, Štefan Miklošovič 
> >> >> > > > <[email protected]> wrote:
> >> >> > > >
> >> >> > > >
> >> >> > > > I am open to adding some kind of metrics when it fallsbacks to 
> >> >> > > > track
> >> >> > > > if / how often it failed by hardware etc. Wondering what others 
> >> >> > > > think
> >> >> > > > about fallbacking just like that. I feel like something is not
> >> >> > > > transparent to a user who relies on hardware compression in the 
> >> >> > > > first
> >> >> > > > place.
> >> >> > > >
> >> >> > > > On Tue, Dec 16, 2025 at 1:52 PM Štefan Miklošovič
> >> >> > > > <[email protected]> wrote:
> >> >> > > >
> >> >> > > >
> >> >> > > > My personal preference is to not do any fallbacking. The reason 
> >> >> > > > for
> >> >> > > > that is that failures should be transparent and if it is meant to 
> >> >> > > > fail
> >> >> > > > so be it.
> >> >> > > >
> >> >> > > > If we wrap it in try-catch and fallback, then a user thinks that
> >> >> > > > everything is just fine, right? There is no visibility into 
> >> >> > > > whether
> >> >> > > > and how often it failed so a user can act on that. By 
> >> >> > > > fallbacking, a
> >> >> > > > user is kind of mislead, as they think that all is just fine while
> >> >> > > > they can not wrap they head around the fact that they bought 
> >> >> > > > hardware
> >> >> > > > which says that their compression will be accelerated while 
> >> >> > > > looking at
> >> >> > > > their dashboards and every now and then seeing the same 
> >> >> > > > performance as
> >> >> > > > if they were compressing by software.
> >> >> > > >
> >> >> > > > If they see that it is failing then they can reach out to the 
> >> >> > > > vendor
> >> >> > > > of such hardware, then raise complaints and issues with it so the
> >> >> > > > vendor's engineers can look into why it failed and how to fix it.
> >> >> > > > Instead of just wrapping it in one try-catch and acting like all 
> >> >> > > > is
> >> >> > > > actually fine. A user bought hardware to compress it, I do not 
> >> >> > > > think
> >> >> > > > they are interested in "best-effort" here. If that hardware 
> >> >> > > > fails, or
> >> >> > > > the software which is managing it is erroneous, then it should be
> >> >> > > > either fixed or replaced.
> >> >> > > >
> >> >> > > > On Tue, Dec 16, 2025 at 2:29 AM Kokoori, Shylaja
> >> >> > > > <[email protected]> wrote:
> >> >> > > > >
> >> >> > > > > Hi Stefan,
> >> >> > > > > Thank you very much for the feedback.
> >> >> > > > > You are correct, QAT is available on-die and not hot-plugged, 
> >> >> > > > > and under normal circumstances , we shouldn't encounter this 
> >> >> > > > > exception. However, wanted to add reverting to base compressor 
> >> >> > > > > to make it fault-tolerant.
> >> >> > > > >
> >> >> > > > > While the QAT software stack includes built-in retries and 
> >> >> > > > > software fallbacks for scenarios when devices end up being busy 
> >> >> > > > > etc., I didn't want operations to fail due to transient 
> >> >> > > > > hardware issues which otherwise would have succeeded. An 
> >> >> > > > > example would be, some unrecoverable error occurring during a 
> >> >> > > > > compress/decompress operation—whether due to a hardware issue 
> >> >> > > > > or caused by related software libraries—the system can 
> >> >> > > > > gracefully revert to the base compressor rather than failing 
> >> >> > > > > the operation entirely.
> >> >> > > > >
> >> >> > > > > I am open to other suggestions.
> >> >> > > > > Thanks,
> >> >> > > > > Shylaja
> >> >> > > > > ________________________________
> >> >> > > > > From: Štefan Miklošovič <[email protected]>
> >> >> > > > > Sent: Monday, December 15, 2025 2:50 PM
> >> >> > > > > To: [email protected] <[email protected]>
> >> >> > > > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
> >> >> > > > >
> >> >> > > > > Hi Shylaja,
> >> >> > > > >
> >> >> > > > > I am going through CEP so I can make the decision when voting 
> >> >> > > > > and I
> >> >> > > > > want to clarify a few things.
> >> >> > > > >
> >> >> > > > > You say there:
> >> >> > > > >
> >> >> > > > > Both the default compressor instance and a plugin compressor 
> >> >> > > > > instance
> >> >> > > > > (obtained from the provider), will be maintained by Cassandra. 
> >> >> > > > > For
> >> >> > > > > subsequent read/write operations, the plugin compressor will be 
> >> >> > > > > used.
> >> >> > > > > However, if the plugin version encounters an error, the default
> >> >> > > > > compressor will handle the operation.
> >> >> > > > >
> >> >> > > > > Why are we doing this kind of "fallback"? Under what 
> >> >> > > > > circumstances
> >> >> > > > > "the plugin version encounters an error"? Why would it? It 
> >> >> > > > > might be
> >> >> > > > > understandable to do it like that if that compression 
> >> >> > > > > accelerator
> >> >> > > > > would be some "plug and play" or we could just remove it from a
> >> >> > > > > running machine. But this does not seem to be the case? QAT you 
> >> >> > > > > are
> >> >> > > > > mentioning is baked into the CPU, right? It is not like we would
> >> >> > > > > decide to just turn it suddenly off in runtime so the database 
> >> >> > > > > would
> >> >> > > > > need to deal with it.
> >> >> > > > >
> >> >> > > > > The reason I am asking is that I just briefly went over the PR 
> >> >> > > > > and the
> >> >> > > > > way it works there is that if plugin de/compression is not 
> >> >> > > > > possible
> >> >> > > > > (it throws IOException) then it will default to a software 
> >> >> > > > > solution.
> >> >> > > > > This is done for every single de/compression of a chunk.
> >> >> > > > >
> >> >> > > > > Is this design the absolute must?
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On Mon, Dec 15, 2025 at 8:14 PM Josh McKenzie 
> >> >> > > > > <[email protected]> wrote:
> >> >> > > > > >
> >> >> > > > > > Yes but it's in reply to the discussion thread and so it 
> >> >> > > > > > threads that way in clients
> >> >> > > > > >
> >> >> > > > > > Apparently not in fastmail's client because it shows up as 
> >> >> > > > > > its own thread for me. /sigh
> >> >> > > > > >
> >> >> > > > > > Hence the confusion. Makes sense now.
> >> >> > > > > >
> >> >> > > > > > On Mon, Dec 15, 2025, at 1:18 PM, Kokoori, Shylaja wrote:
> >> >> > > > > >
> >> >> > > > > > Thank you for your feedback, Patrick & Brandon. I have 
> >> >> > > > > > created a new email thread like you suggested. Hopefully, 
> >> >> > > > > > this works.
> >> >> > > > > >
> >> >> > > > > > -Shylaja
> >> >> > > > > >
> >> >> > > > > > ________________________________
> >> >> > > > > > From: Patrick McFadin <[email protected]>
> >> >> > > > > > Sent: Monday, December 15, 2025 9:26 AM
> >> >> > > > > > To: [email protected] <[email protected]>
> >> >> > > > > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
> >> >> > > > > >
> >> >> > > > > > That was my point. It's a [DISCUSS] thread.
> >> >> > > > > >
> >> >> > > > > > On Mon, Dec 15, 2025 at 9:22 AM Brandon Williams 
> >> >> > > > > > <[email protected]> wrote:
> >> >> > > > > >
> >> >> > > > > > On Mon, Dec 15, 2025 at 11:13 AM Josh McKenzie 
> >> >> > > > > > <[email protected]> wrote:
> >> >> > > > > > >
> >> >> > > > > > > Can you put this into a [VOTE] thread?
> >> >> > > > > > >
> >> >> > > > > > > I'm confused - isn't the subject of this email [VOTE]?
> >> >> > > > > >
> >> >> > > > > > Yes but it's in reply to the discussion thread and so it 
> >> >> > > > > > threads that
> >> >> > > > > > way in clients, making it easy to overlook.
> >> >> > > > > >
> >> >> > > > > > Kind Regards,
> >> >> > > > > > Brandon
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> >
> >> >> >

Re: [VOTE] CEP-49: Hardware-accelerated compression

Reply via email to