Hi Jindal, Thanks for the questions. As Stefan mentioned (thanks to Stefan too), ZSTD dictionary compression is toggled via DDL at the table level. It is exactly the same way we configure compression for tables as of today. It is captured in the "New or Changed Public Interfaces" section of the CEP.
There is a cassandra-zstd-discuss slack channel. Please join if you are interested. Hi Dinesh, A "way to allowlist compression strategies as admins" sounds useful. It could be a guardrail if it is not there yet and can be added separately. If the compression is left off, we should default to something sensible. Maybe you can elaborate. I think the current behavior is no compression if compression is not configured. It might not be ideal in some cases. Maybe there could be an admin flag that admins can turn on to always compress with the default compressor (LZ4) when no compression is configured? If it sounds interesting, the toggle could be added separately too. - Yifan On Fri, Sep 5, 2025 at 9:33 AM Dinesh Joshi <djo...@apache.org> wrote: > On a related note, I don't recall if we have any way to allowlist > compression strategies as admins? If not, it would be very helpful where > the DB operator wants to avoid users that forget or do not set compression > in their schema. If the compression is left off, we should default to > something sensible. > > On Fri, Sep 5, 2025 at 8:54 AM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> Hi, >> >> in table schema, there would be table compression configuration. Like new >> options for enabling compression, sampling strategy etc. >> >> Then in cassandra.yaml, auto train, auto prune obsolete dics, training >> frequency, acceptance percentage, dictionary size and memory limits etc. >> >> I took this from cassandra-zstd-discuss channel where this aspect of that >> was discussed and answered when I asked same question as you. >> >> AFAIK it will be on _table level_. >> >> You would just alter your table and change compression to some other >> compression strategy or you might just go to Zstd without dictionaries. >> >> Regards >> >> >> >> On Fri, Sep 5, 2025 at 5:23 PM Jindal, Himanshu <himan...@amazon.com> >> wrote: >> >>> Hi Yifan, >>> This looks very promising for customers aiming to improve Cassandra >>> performance. I had a few questions on the user experience: >>> >>> - How does a user enable this feature—via YAML config or through CQL >>> DDL? >>> - If it’s CQL, is it applied at the keyspace or table level? >>> - Is the process for disabling the feature the same? >>> >>> Thanks, >>> Himanshu >>> >>> >>> *From: *Yifan Cai <yc25c...@gmail.com> >>> *Date: *Thursday, September 4, 2025 at 7:00 PM >>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org> >>> *Subject: *RE: [EXTERNAL] [DISCUSS] CEP-54: ZSTD Compression with >>> Dictionary Support >>> >>> *CAUTION*: This email originated from outside of the organization. Do >>> not click links or open attachments unless you can confirm the sender and >>> know the content is safe. >>> >>> Noted with thanks. >>> >>> I agree that it does not need to be zstd specific. The additional dict >>> information for CompressionInfo are dictionary id, dictionary bytes and >>> checksum of id and content. It should be common for other dictionary-based >>> compression algorithms. In terms of implementation, I will keep this in >>> mind. >>> >>> - Yifan >>> >>> On Thu, Sep 4, 2025 at 5:49 PM David Capwell <dcapw...@apple.com> wrote: >>> >>> Thanks for bringing this out! >>> >>> My first question when quickly looking at this is can we make the >>> CompressionInfo change agnostic to the algorithm or have the format change >>> based off the algorithm? Lz4 has similar (though not as easy to use as >>> zstd) feature and new algorithms might come out which we want to include >>> later on; It would be a shame to have the format tightly coupled to zstd >>> only. >>> >>> >>> On Sep 4, 2025, at 1:50 PM, Yifan Cai <yc25c...@gmail.com> wrote: >>> >>> Hi community, >>> >>> We would like to propose *CEP-54: ZSTD Compression with Dictionary >>> Support* for adoption by the community: >>> >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-54%3A+ZSTD+with+Dictionary+SSTable+Compression >>> >>> This CEP proposes introducing ZSTD with dictionary compression for >>> SSTables. This feature allows users who need it to achieve significant >>> improvements in compression ratio and speed, leading to better performance >>> and storage efficiency. This is an entirely opt-in feature. >>> >>> The proposed ZSTD with dictionary support will enable organizations to >>> achieve: >>> >>> - Faster read/write performance. >>> - Reduced storage footprint. >>> - Increased storage device lifetime from fewer writes. >>> >>> Key design principles: >>> >>> - Zero impact on users who don't enable the feature. >>> - Initial emphasis on simplicity, supporting a single global dictionary >>> per table and manual training, while maintaining extensibility for future >>> automation. >>> - SSTable-attached dictionaries to ensure that operations like backup, >>> restore, and streaming continue to work seamlessly. >>> - Graceful fallback to standard ZSTD compression when a dictionary isn't >>> available. >>> - A critical design constraint to avoid a large number of unique >>> dictionaries, which can hurt decompression speed. >>> >>> This enhancement addresses the need for better storage efficiency and >>> performance by leveraging ZSTD dictionaries, while maintaining complete >>> backward compatibility and requiring no changes to existing deployments >>> that do not enable the feature. >>> >>> Thanks to Jon Haddad for bringing up the topic and providing feedbacks >>> in shaping the design, and to Dinesh Joshi, Joey Lynch, Stefan Miklosovic, >>> and Francisco Guerrero for providing design feedbacks. >>> >>> Thanks in advance for your time and feedback. Please keep the discussion >>> on this mailing list thread. >>> >>> - Yifan >>> >>> >>>