Okay I guess that is a good compromise to make here. So warning in the logs + metrics? I think that metrics would be cool to have so we might chart how often it happens etc.
On Tue, Dec 16, 2025 at 4:27 PM C. Scott Andreas <[email protected]> wrote: > > One example where lack of a fallback would be problematic is: > > – User provisions AWS metal-class instances that expose hardware QAT and > adopts. > – User needs to expand cluster or replace failed hardware. > – Insufficient hardware-QAT-capable machines available from AWS > – Cassandra unable to start on replacement/expanded machines due to lack of > fallback. > > There are a handful of cases where the database performs similar fallbacks > today, such as attempting mlockall on startup for improved memory locality > and to avoid allocation stalls. > > As a user, I'd rather have a WARN in my logs than to be unable to start the > database without changing cluster-wide configuration like schema / compaction > parameters. > > – Scott > > On Dec 16, 2025, at 5:18 AM, Štefan Miklošovič <[email protected]> wrote: > > > I am open to adding some kind of metrics when it fallsbacks to track > if / how often it failed by hardware etc. Wondering what others think > about fallbacking just like that. I feel like something is not > transparent to a user who relies on hardware compression in the first > place. > > On Tue, Dec 16, 2025 at 1:52 PM Štefan Miklošovič > <[email protected]> wrote: > > > My personal preference is to not do any fallbacking. The reason for > that is that failures should be transparent and if it is meant to fail > so be it. > > If we wrap it in try-catch and fallback, then a user thinks that > everything is just fine, right? There is no visibility into whether > and how often it failed so a user can act on that. By fallbacking, a > user is kind of mislead, as they think that all is just fine while > they can not wrap they head around the fact that they bought hardware > which says that their compression will be accelerated while looking at > their dashboards and every now and then seeing the same performance as > if they were compressing by software. > > If they see that it is failing then they can reach out to the vendor > of such hardware, then raise complaints and issues with it so the > vendor's engineers can look into why it failed and how to fix it. > Instead of just wrapping it in one try-catch and acting like all is > actually fine. A user bought hardware to compress it, I do not think > they are interested in "best-effort" here. If that hardware fails, or > the software which is managing it is erroneous, then it should be > either fixed or replaced. > > On Tue, Dec 16, 2025 at 2:29 AM Kokoori, Shylaja > <[email protected]> wrote: > > > > Hi Stefan, > > Thank you very much for the feedback. > > You are correct, QAT is available on-die and not hot-plugged, and under > > normal circumstances , we shouldn't encounter this exception. However, > > wanted to add reverting to base compressor to make it fault-tolerant. > > > > While the QAT software stack includes built-in retries and software > > fallbacks for scenarios when devices end up being busy etc., I didn't want > > operations to fail due to transient hardware issues which otherwise would > > have succeeded. An example would be, some unrecoverable error occurring > > during a compress/decompress operation—whether due to a hardware issue or > > caused by related software libraries—the system can gracefully revert to > > the base compressor rather than failing the operation entirely. > > > > I am open to other suggestions. > > Thanks, > > Shylaja > > ________________________________ > > From: Štefan Miklošovič <[email protected]> > > Sent: Monday, December 15, 2025 2:50 PM > > To: [email protected] <[email protected]> > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression > > > > Hi Shylaja, > > > > I am going through CEP so I can make the decision when voting and I > > want to clarify a few things. > > > > You say there: > > > > Both the default compressor instance and a plugin compressor instance > > (obtained from the provider), will be maintained by Cassandra. For > > subsequent read/write operations, the plugin compressor will be used. > > However, if the plugin version encounters an error, the default > > compressor will handle the operation. > > > > Why are we doing this kind of "fallback"? Under what circumstances > > "the plugin version encounters an error"? Why would it? It might be > > understandable to do it like that if that compression accelerator > > would be some "plug and play" or we could just remove it from a > > running machine. But this does not seem to be the case? QAT you are > > mentioning is baked into the CPU, right? It is not like we would > > decide to just turn it suddenly off in runtime so the database would > > need to deal with it. > > > > The reason I am asking is that I just briefly went over the PR and the > > way it works there is that if plugin de/compression is not possible > > (it throws IOException) then it will default to a software solution. > > This is done for every single de/compression of a chunk. > > > > Is this design the absolute must? > > > > > > On Mon, Dec 15, 2025 at 8:14 PM Josh McKenzie <[email protected]> wrote: > > > > > > Yes but it's in reply to the discussion thread and so it threads that way > > > in clients > > > > > > Apparently not in fastmail's client because it shows up as its own thread > > > for me. /sigh > > > > > > Hence the confusion. Makes sense now. > > > > > > On Mon, Dec 15, 2025, at 1:18 PM, Kokoori, Shylaja wrote: > > > > > > Thank you for your feedback, Patrick & Brandon. I have created a new > > > email thread like you suggested. Hopefully, this works. > > > > > > -Shylaja > > > > > > ________________________________ > > > From: Patrick McFadin <[email protected]> > > > Sent: Monday, December 15, 2025 9:26 AM > > > To: [email protected] <[email protected]> > > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression > > > > > > That was my point. It's a [DISCUSS] thread. > > > > > > On Mon, Dec 15, 2025 at 9:22 AM Brandon Williams <[email protected]> wrote: > > > > > > On Mon, Dec 15, 2025 at 11:13 AM Josh McKenzie <[email protected]> > > > wrote: > > > > > > > > Can you put this into a [VOTE] thread? > > > > > > > > I'm confused - isn't the subject of this email [VOTE]? > > > > > > Yes but it's in reply to the discussion thread and so it threads that > > > way in clients, making it easy to overlook. > > > > > > Kind Regards, > > > Brandon > > > > > > > > >
