[jira] [Updated] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Stefan Miklosovic (Jira) Mon, 02 Mar 2026 12:08:51 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Miklosovic updated CASSANDRA-21194:
------------------------------------------
    Status: Ready to Commit  (was: Review In Progress)

> Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in 
> pointless
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21194
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21194
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ZstdDictTrainer from zstd-jni library we use uses 
> ByteBuffer.allocateDirect(size) for training samples. {{size}} is integer. 
> Integer.MAX_VALUE is basically 2.0 GiB. So if a user wants to sample on more, 
> like 3GiB, the sampling just stops at 2GiB and in training output it looks 
> like it is stuck. We should validate this value before training and reject 
> anything bigger than 2GiB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Reply via email to