[jira] [Commented] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Stefan Miklosovic (Jira) Mon, 02 Mar 2026 08:03:29 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062204#comment-18062204
 ]


Stefan Miklosovic commented on CASSANDRA-21194:
-----------------------------------------------

https://pre-ci.cassandra.apache.org/job/cassandra/455/#showFailuresLink

> Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in 
> pointless
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21194
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21194
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ZstdDictTrainer from zstd-jni library we use uses 
> ByteBuffer.allocateDirect(size) for training samples. {{size}} is integer. 
> Integer.MAX_VALUE is basically 2.0 GiB. So if a user wants to sample on more, 
> like 3GiB, the sampling just stops at 2GiB and in training output it looks 
> like it is stuck. We should validate this value before training and reject 
> anything bigger than 2GiB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Reply via email to