Here are the meeting notes. https://docs.google.com/document/d/1Pnirz6sSYNrStlN3k90yUo-MIj9pp6lxM_RdiyJbcPA/edit?usp=sharing
We shared context about the ZSTD w/ dictionary prototypes and findings. Discussed the implementations focusing on both SSTable compression and potential client-side compression benefits. - Yifan On Fri, Aug 1, 2025 at 1:33 PM Yifan Cai <yc25c...@gmail.com> wrote: > I'm excited to hear about the interest in this feature! I'm scheduling a > Google Meet for Tuesday at 9 AM PST for one hour to discuss ZSTD with > dictionary compression in Cassandra. I will send the meeting details closer > to the time of the meeting. Please send me an email if you would like to > participate. > > > - Yifan > > On Fri, Aug 1, 2025 at 12:58 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> Sure! Please share the link to the call if possible. I will be glad to >> participate in this in whatever way I can. >> >> Regards >> >> On Fri, Aug 1, 2025 at 6:53 PM Dinesh Joshi <djo...@apache.org> wrote: >> >>> We have explored compressing using trained dictionaries at various >>> levels - component, table, keyspace level. Obviously component level >>> dictionary compression is best but results in a _lot_ of dictionaries. >>> Anyway, this really needs a bit of thought. Since there is a lot of >>> interest and prior work that each of us may have done, I would suggest we >>> discuss the various approaches in this thread or get on a quick call and >>> bring back the summary back to this list. Happy to organize a call if y'all >>> are interested. >>> >>> >>> On Fri, Aug 1, 2025 at 9:07 AM Štefan Miklošovič <smikloso...@apache.org> >>> wrote: >>> >>>> Looking into my prototype (I think it is not doing anything yet, just >>>> WIP), I am training it on flushing so that is in line with what Jon is >>>> trying to do as well / what he suggests would be optimal. >>>> >>>> I do not have a dedicated dictionary component, what I tried to do was >>>> to just put the dict directly into COMPRESSION_INFO and then bumped the >>>> SSTable version with a boolean saying if it supports dictionary or not. So >>>> there is one component less at least. >>>> >>>> On Fri, Aug 1, 2025 at 5:59 PM Yifan Cai <yc25c...@gmail.com> wrote: >>>> >>>>> Yeah. I have built 2 POCs and have initial benchmark data comparing w/ >>>>> and w/o dictionary. Unfortunately, the work went to backlog. I can pick it >>>>> up again if there is a demand for the feature. >>>>> There are some discussions in the Jira that Stefan linked. (thanks >>>>> Stefan!) >>>>> >>>>> - Yifan >>>>> >>>>> ------------------------------ >>>>> *From:* Štefan Miklošovič <smikloso...@apache.org> >>>>> *Sent:* Friday, August 1, 2025 8:54:07 AM >>>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> *Subject:* Re: zstd dictionaries >>>>> >>>>> There is already a ticket for this >>>>> https://issues.apache.org/jira/browse/CASSANDRA-17021 >>>>> >>>>> I would love to see this in action, I was investigating this a few >>>>> years ago when ZSTD landed for the first time in 4.0 I think, I was >>>>> discussing that with Yifan, I think, if my memory serves me well, but, as >>>>> other things, it just went nowhere and was probably forgotten. I think >>>>> that >>>>> there might be some POC around already. I started to work on this few >>>>> years >>>>> ago and I abandoned it because ... I still have a branch around and it >>>>> would be great to compare what you have etc. >>>>> >>>>> On Fri, Aug 1, 2025 at 5:12 PM Jon Haddad <j...@rustyrazorblade.com> >>>>> wrote: >>>>> >>>>> Hi folks, >>>>> >>>>> I'm working with a team that's interested in seeing zstd dictionaries >>>>> for SSTable compression implemented due to the potential space and cost >>>>> savings. I wanted to share my initial thoughts and get the dev list's >>>>> thoughts as well. >>>>> >>>>> According to the zstd documentation [1], dictionaries can provide >>>>> approximately 3x improvement in space savings compared to non-dictionary >>>>> compression, along with roughly 4x faster compression and decompression >>>>> performance. The site notes that "training works if there is some >>>>> correlation in a family of small data samples. The more data-specific a >>>>> dictionary is, the more efficient it is (there is no universal >>>>> dictionary). >>>>> Hence, deploying one dictionary per type of data will provide the greatest >>>>> benefits." >>>>> >>>>> The implementation appears straightforward from a code perspective, >>>>> but there are some architectural considerations I'd like to discuss: >>>>> >>>>> *Dictionary Management* One critical aspect is that the dictionary >>>>> becomes essential for data recovery - if you lose the dictionary, you lose >>>>> access to the compressed data, similar to losing an encryption key. >>>>> (Please >>>>> correct me if I'm misunderstanding this dependency.) >>>>> >>>>> *Storage Approach* I'm considering two options for storing the >>>>> dictionary: >>>>> >>>>> 1. >>>>> >>>>> *SSTable Component*: Save the dictionary as a separate SSTable >>>>> component alongside the existing files. My hesitation here is that >>>>> we've >>>>> traditionally maintained that Data.db is the only essential component. >>>>> 2. >>>>> >>>>> *Data.db Header*: Embed the dictionary directly in the Data.db >>>>> file header. >>>>> >>>>> I'm strongly leaning toward the component approach because it avoids >>>>> modifications to the Data.db file format and can leverage our existing >>>>> streaming infrastructure. I spoke with Blake about this and it sounds >>>>> like >>>>> some of the newer features are more dependent on the components other than >>>>> Data, so I think this is acceptable. >>>>> >>>>> Dictionary Generation >>>>> >>>>> We currently default to flushing using LZ4, although I think that's >>>>> only an optimization to avoid high overhead from zSTD. Using the memtable >>>>> data to create a dictionary prior to flush could remove the need for this >>>>> optimization entirely. >>>>> >>>>> During compaction, my plan is to generate dictionaries by either >>>>> sampling chunks from existing files (similar overhead to reading random >>>>> rows) or using just the first pages of data from each SSTable. I'd need >>>>> to >>>>> do some testing to see what the optimal setup is here. >>>>> >>>>> Opt-in: I think the initial version of this should be opt-in via a >>>>> flag on compression, but assuming it delivers on the performance and space >>>>> gains I think we'd want to remove the flag and make it the default. >>>>> Assuming this feature lands in 6.0, I'd be looking to make it on by >>>>> default >>>>> in 7.0 when using zSTD. The performance table lists lz4 as still more >>>>> performant so I think we'd probably leave it as the default compression >>>>> strategy, although performance benchmarks should be our guide here. >>>>> >>>>> Questions for the Community >>>>> >>>>> - Has anyone already explored zstd dictionaries for Cassandra? >>>>> - If so, are there existing performance tests or benchmarks? >>>>> - Any thoughts on the storage approach or dictionary generation >>>>> strategy? >>>>> - Other considerations I might be missing? >>>>> >>>>> It seems like this would be a fairly easy win to improving density in >>>>> clusters that are limited by disk space per node. It should also improve >>>>> overall performance by reducing compression and decompression overhead. >>>>> For the team I'm working with, we'd be reducing node count in AWS by >>>>> several hundred nodes. We started with about 1K nodes at 4TB / node, and >>>>> were able to remove roughly 700 with the introduction of CASSANDRA-15452 >>>>> (now at approximately 13TB /node), and are looking to cut the number at >>>>> least in half again. >>>>> >>>>> Looking forward to hearing your thoughts. >>>>> >>>>> Thanks, >>>>> >>>>> Jon >>>>> [1] https://facebook.github.io/zstd/ >>>>> >>>>>