[
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687740#comment-17687740
]
Yifan Cai commented on CASSANDRA-17021:
---------------------------------------
Hi [~smiklosovic], this JIRA was somehow slipped through. However, I do have
two prototypes built already, 1) dictionary per sstable and 2) dictionary per
Cassandra table that updates and stores the dictionaries of a table
periodically (similar to what you described). I have also done performance
evaluation of both prototypes and had some preliminary results (, which I do
not want to share at this moment).
I have been on a leave since the beginning of the year until this week. I will
post updates in March. I am assigning the ticket back to myself.
bq. What happens when dictionary gets lost or if it is corrupted? Are data
"uncompressable" for ever? How does uncompressing on data without dictionary
work?
If data is compressed with a dictionary, the exact same dictionary has to be
used for decompression. The scenarios in the questions lead to data loss.
Given that the dictionary size can be limited, it is feasible to just embed the
dictioanry within the CompressionInfo. It eases the handling of dictionaries,
with (arguably) negligible space overhead due to the duplications of the
dictionary content (several kb). For SSTables smaller than a size threshold, it
should be compressed w/o a dictionary.
> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
> Key: CASSANDRA-17021
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
> Project: Cassandra
> Issue Type: Improvement
> Components: Feature/Compression
> Reporter: Dinesh Joshi
> Assignee: Stefan Miklosovic
> Priority: Normal
>
> Currently Cassandra supports zstd compression. However, Zstd also supports
> dictionaries to enhance not only the compression ratio but also the speed.
> Dictionaries can show 3-4x savings. We should add support to train
> dictionaries, ideally per SSTable this will yield the maximum gains.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]