[jira] [Updated] (CASSANDRA-21209) Rework ZSTD dictionary compression logic to create a trainer per training

Stefan Miklosovic (Jira) Thu, 12 Mar 2026 09:10:07 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-21209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Miklosovic updated CASSANDRA-21209:
------------------------------------------
    Description: We should rework the current code to get rid of notions of 
auto-training even further. We need to create a trainer on demand per each 
training session rather than having one trainer per whole manager. That will 
enable us to eventually do auto-training as well, albeit a little bit 
differently (but simpler).  (was: We can specify how much sampling data there 
should be for training of a dictionary. The buffer for training is a direct 
buffer. If we say that we will be training on 2GiB, then it will try to create 
a direct buffer of size 2GiB.

This problem is visible e.g. when starting Cassandra via IDEA which uses 1GiB 
heap. I think how it works is that max direct memory size for JVM is basically 
same as xmx so it will fail. 

The fix would consist of wrapping creation of that buffer in a try-catch and 
propagate the error in some sanitized way up to a caller informing them it is 
not possible to create such a big sampling buffer.)

> Rework ZSTD dictionary compression logic to create a trainer per training
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21209
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/Compression
>            Reporter: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should rework the current code to get rid of notions of auto-training even 
> further. We need to create a trainer on demand per each training session 
> rather than having one trainer per whole manager. That will enable us to 
> eventually do auto-training as well, albeit a little bit differently (but 
> simpler).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-21209) Rework ZSTD dictionary compression logic to create a trainer per training

Reply via email to