[ 
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687740#comment-17687740
 ] 

Yifan Cai commented on CASSANDRA-17021:
---------------------------------------

Hi [~smiklosovic], this JIRA was somehow slipped through. However, I do have 
two prototypes built already, 1) dictionary per sstable and 2) dictionary per 
Cassandra table that updates and stores the dictionaries of a table 
periodically (similar to what you described). I have also done performance 
evaluation of both prototypes and had some preliminary results (, which I do 
not want to share at this moment). 

I have been on a leave since the beginning of the year until this week. I will 
post updates in March. I am assigning the ticket back to myself. 

bq. What happens when dictionary gets lost or if it is corrupted? Are data 
"uncompressable" for ever? How does uncompressing on data without dictionary 
work?

If data is compressed with a dictionary, the exact same dictionary has to be 
used for decompression. The scenarios in the questions lead to data loss. 

Given that the dictionary size can be limited, it is feasible to just embed the 
dictioanry within the CompressionInfo. It eases the handling of dictionaries, 
with (arguably) negligible space overhead due to the duplications of the 
dictionary content (several kb). For SSTables smaller than a size threshold, it 
should be compressed w/o a dictionary.

> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17021
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Dinesh Joshi
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>
> Currently Cassandra supports zstd compression. However, Zstd also supports 
> dictionaries to enhance not only the compression ratio but also the speed. 
> Dictionaries can show 3-4x savings. We should add support to train 
> dictionaries, ideally per SSTable this will yield the maximum gains.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to