[ 
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028242#comment-18028242
 ] 

Jon Haddad commented on CASSANDRA-17021:
----------------------------------------

I've provided some docs for this patch, which Yifan has merged into his branch. 
 Some minor usability things:

It would be nice if this could tell me some stats about what's been sampled.  
Do we have anything available?  Something like this could be nice:


{noformat}
nodetool traincompressiondictionary --status chunk_4kb keyvalue
Trainer is collecting sample data for chunk_4kb.keyvalue.  150,000 rows 
sampled, 9 minutes remaining.{noformat}
 

 

Can we check once a minute by default for new dictionaries?  

I'm finding that training can fail if memtables aren't flushed to disk.  This 
can happen on systems with a ton of memory.  I didn't have a flush despite 10MM 
writes, and as a result, this failed:
{noformat}

Caused by: java.lang.IllegalStateException: Insufficient samples for training: 
0 (minimum required: 10)
    at 
org.apache.cassandra.db.compression.ZstdDictionaryTrainer.trainDictionary(ZstdDictionaryTrainer.java:118)
    at 
org.apache.cassandra.db.compression.ICompressionDictionaryTrainer.lambda$trainDictionaryAsync$0(ICompressionDictionaryTrainer.java:81)
    at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    ... 9 common frames omitted{noformat}

I don't think this is a deal breaker, as it'll still be useful for a lot of 
people, but I think this will bite some folks.

 

> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17021
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Dinesh Joshi
>            Assignee: Yifan Cai
>            Priority: Normal
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently Cassandra supports zstd compression. However, Zstd also supports 
> dictionaries to enhance not only the compression ratio but also the speed. 
> Dictionaries can show 3-4x savings. We should add support to train 
> dictionaries, ideally per SSTable this will yield the maximum gains.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to