[ 
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028472#comment-18028472
 ] 

Yifan Cai commented on CASSANDRA-17021:
---------------------------------------

Agreed. I think just enabling the flag, say "--use-existing-sstables", and let 
the tool to find chunk to sample is my preference too. 

Roughly 10 MiB is what we want to collect before training. We should be able to 
do some calculation and filter SSTables and read chunks directly; avoiding 
reading through SSTables, as you mentioned. 

If it sounds good to you too, I will address it in this patch. 

> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17021
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Dinesh Joshi
>            Assignee: Yifan Cai
>            Priority: Normal
>          Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently Cassandra supports zstd compression. However, Zstd also supports 
> dictionaries to enhance not only the compression ratio but also the speed. 
> Dictionaries can show 3-4x savings. We should add support to train 
> dictionaries, ideally per SSTable this will yield the maximum gains.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to