Re: [VOTE] CEP-49: Hardware-accelerated compression

Štefan Miklošovič Tue, 16 Dec 2025 07:33:12 -0800

Okay I guess that is a good compromise to make here. So warning in the
logs + metrics? I think that metrics would be cool to have so we might
chart how often it happens etc.


On Tue, Dec 16, 2025 at 4:27 PM C. Scott Andreas <[email protected]> wrote:
>
> One example where lack of a fallback would be problematic is:
>
> – User provisions AWS metal-class instances that expose hardware QAT and 
> adopts.
> – User needs to expand cluster or replace failed hardware.
> – Insufficient hardware-QAT-capable machines available from AWS
> – Cassandra unable to start on replacement/expanded machines due to lack of 
> fallback.
>
> There are a handful of cases where the database performs similar fallbacks 
> today, such as attempting mlockall on startup for improved memory locality 
> and to avoid allocation stalls.
>
> As a user, I'd rather have a WARN in my logs than to be unable to start the 
> database without changing cluster-wide configuration like schema / compaction 
> parameters.
>
> – Scott
>
> On Dec 16, 2025, at 5:18 AM, Štefan Miklošovič <[email protected]> wrote:
>
>
> I am open to adding some kind of metrics when it fallsbacks to track
> if / how often it failed by hardware etc. Wondering what others think
> about fallbacking just like that. I feel like something is not
> transparent to a user who relies on hardware compression in the first
> place.
>
> On Tue, Dec 16, 2025 at 1:52 PM Štefan Miklošovič
> <[email protected]> wrote:
>
>
> My personal preference is to not do any fallbacking. The reason for
> that is that failures should be transparent and if it is meant to fail
> so be it.
>
> If we wrap it in try-catch and fallback, then a user thinks that
> everything is just fine, right? There is no visibility into whether
> and how often it failed so a user can act on that. By fallbacking, a
> user is kind of mislead, as they think that all is just fine while
> they can not wrap they head around the fact that they bought hardware
> which says that their compression will be accelerated while looking at
> their dashboards and every now and then seeing the same performance as
> if they were compressing by software.
>
> If they see that it is failing then they can reach out to the vendor
> of such hardware, then raise complaints and issues with it so the
> vendor's engineers can look into why it failed and how to fix it.
> Instead of just wrapping it in one try-catch and acting like all is
> actually fine. A user bought hardware to compress it, I do not think
> they are interested in "best-effort" here. If that hardware fails, or
> the software which is managing it is erroneous, then it should be
> either fixed or replaced.
>
> On Tue, Dec 16, 2025 at 2:29 AM Kokoori, Shylaja
> <[email protected]> wrote:
> >
> > Hi Stefan,
> > Thank you very much for the feedback.
> > You are correct, QAT is available on-die and not hot-plugged, and under 
> > normal circumstances , we shouldn't encounter this exception. However, 
> > wanted to add reverting to base compressor to make it fault-tolerant.
> >
> > While the QAT software stack includes built-in retries and software 
> > fallbacks for scenarios when devices end up being busy etc., I didn't want 
> > operations to fail due to transient hardware issues which otherwise would 
> > have succeeded. An example would be, some unrecoverable error occurring 
> > during a compress/decompress operation—whether due to a hardware issue or 
> > caused by related software libraries—the system can gracefully revert to 
> > the base compressor rather than failing the operation entirely.
> >
> > I am open to other suggestions.
> > Thanks,
> > Shylaja
> > ________________________________
> > From: Štefan Miklošovič <[email protected]>
> > Sent: Monday, December 15, 2025 2:50 PM
> > To: [email protected] <[email protected]>
> > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
> >
> > Hi Shylaja,
> >
> > I am going through CEP so I can make the decision when voting and I
> > want to clarify a few things.
> >
> > You say there:
> >
> > Both the default compressor instance and a plugin compressor instance
> > (obtained from the provider), will be maintained by Cassandra. For
> > subsequent read/write operations, the plugin compressor will be used.
> > However, if the plugin version encounters an error, the default
> > compressor will handle the operation.
> >
> > Why are we doing this kind of "fallback"? Under what circumstances
> > "the plugin version encounters an error"? Why would it? It might be
> > understandable to do it like that if that compression accelerator
> > would be some "plug and play" or we could just remove it from a
> > running machine. But this does not seem to be the case? QAT you are
> > mentioning is baked into the CPU, right? It is not like we would
> > decide to just turn it suddenly off in runtime so the database would
> > need to deal with it.
> >
> > The reason I am asking is that I just briefly went over the PR and the
> > way it works there is that if plugin de/compression is not possible
> > (it throws IOException) then it will default to a software solution.
> > This is done for every single de/compression of a chunk.
> >
> > Is this design the absolute must?
> >
> >
> > On Mon, Dec 15, 2025 at 8:14 PM Josh McKenzie <[email protected]> wrote:
> > >
> > > Yes but it's in reply to the discussion thread and so it threads that way 
> > > in clients
> > >
> > > Apparently not in fastmail's client because it shows up as its own thread 
> > > for me. /sigh
> > >
> > > Hence the confusion. Makes sense now.
> > >
> > > On Mon, Dec 15, 2025, at 1:18 PM, Kokoori, Shylaja wrote:
> > >
> > > Thank you for your feedback, Patrick & Brandon. I have created a new 
> > > email thread like you suggested. Hopefully, this works.
> > >
> > > -Shylaja
> > >
> > > ________________________________
> > > From: Patrick McFadin <[email protected]>
> > > Sent: Monday, December 15, 2025 9:26 AM
> > > To: [email protected] <[email protected]>
> > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
> > >
> > > That was my point. It's a [DISCUSS] thread.
> > >
> > > On Mon, Dec 15, 2025 at 9:22 AM Brandon Williams <[email protected]> wrote:
> > >
> > > On Mon, Dec 15, 2025 at 11:13 AM Josh McKenzie <[email protected]> 
> > > wrote:
> > > >
> > > > Can you put this into a [VOTE] thread?
> > > >
> > > > I'm confused - isn't the subject of this email [VOTE]?
> > >
> > > Yes but it's in reply to the discussion thread and so it threads that
> > > way in clients, making it easy to overlook.
> > >
> > > Kind Regards,
> > > Brandon
> > >
> > >
>
>
>

Re: [VOTE] CEP-49: Hardware-accelerated compression

Reply via email to