[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713524#comment-17713524
 ] 

Claude Warren edited comment on CASSANDRA-12937 at 4/18/23 10:09 AM:
---------------------------------------------------------------------

hints_compression and commitlog_compression use the standard ParameterizedClass.

The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass.  The parameters in CompressionParams are 
{code:java}
private final int chunkLength;
private final int maxCompressedLength;  // In content we store max length to 
avoid rounding errors causing compress/decompress mismatch.
private final double minCompressRatio;  // In configuration we store min ratio, 
the input parameter.
{code}
The ParameterizedClass constructor that accepts the Map<String,String> of 
options expects a key of "chunk_length_in_kb" or "chunk_length_kb"  as well as 
a "min_compress_ratio".

This change I made does not change the hints_compression or 
commitlog_compression options.

The yaml file has an additional set of requirements:
 * The chunkLength (yaml: chunk_length) should be specified with the 
DataStorageSpec suffix (e.g. KiB).
 * The maxCompressedLength should be accepted as a parameter.
 * The maxCompressedLength  (yaml: max_compressed_length)  should be specified 
with the DataStorageSpec suffix (e.g. KiB).
 * maxCompressedLength and minCompressRatio are related to each other via 
chunk_length; so only one can be specified.

I could work chunkLength and maxCompressedLength  into the class_name 
parameters, however, I believe this will result in adding 2 more reserved words 
 both of which will need to be removed from the parameter list.  This change 
will affect all CompressionParams  constructions that use the 
Map<String,String> format.  

I will make the change with the following processes for determining collision 
values:
 * If both max_compressed_length and min_compress_ratio are specified an 
ConfigurationException will be thrown.
 * if both chunk_length and either chunk_length_in_kb or chunk_length_kb  are 
specified and they are not equal  ConfiguraitonException will be thrown.
 * if chunk_length or max_compressed_length are specified and do not use the 
DataStorageSpec suffix a ConfigurationException will be thrown

I will also ensure that the short names: lz4, none, noop, snappy, deflate, and 
zstd  will work as class names and use the defaults specified by the 
CompressionParams methods of the same names.


was (Author: claudenw):
hints_compression and commitlog_compression use the standard ParameterizedClass.

The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass.  The parameters in CompressionParams are 
{code:java}
private final int chunkLength;
private final int maxCompressedLength;  // In content we store max length to 
avoid rounding errors causing compress/decompress mismatch.
private final double minCompressRatio;  // In configuration we store min ratio, 
the input parameter.
{code}
The ParameterizedClass constructor that accepts the Map<String,String> of 
options expects a key of "chunk_length_in_kb" or "chunk_length_kb"  as well as 
a "min_compress_ratio".

This change I made does not change the hints_compression or 
commitlog_compression options.

The yaml file has an additional set of requirements:
 * The chunkLength (yaml: chunk_length) should be specified with the 
DataStorageSpec suffix (e.g. KiB).
 * The maxCompressedLength should be accepted as a parameter.
 * The maxCompressedLength  (yaml: max_compressed_length)  should be specified 
with the DataStorageSpec extensions (e.g. KiB).
 * maxCompressedLength and minCompressRatio are related to each other via 
chunk_length; so only one can be specified.

I could work chunkLength and maxCompressedLength  into the class_name 
parameters, however, I believe this will result in adding 2 more reserved words 
 both of which will need to be removed from the parameter list.  This change 
will affect all CompressionParams  constructions that use the 
Map<String,String> format.  

I will make the change with the following processes for determining collision 
values:


 * If both max_compressed_length and min_compress_ratio are specified an 
ConfigurationException will be thrown.
 * if both chunk_length and either chunk_length_in_kb or chunk_length_kb  are 
specified and they are not equal  ConfiguraitonException will be thrown.
 * if chunk_length or max_compressed_length are specified and do not use the 
DataStorageSpec suffix a ConfigurationException will be thrown

I will also ensure that the short names: lz4, none, noop, snappy, deflate, and 
zstd  will work as class names and use the defaults specified by the 
CompressionParams methods of the same names.

> Default setting (yaml) for SSTable compression
> ----------------------------------------------
>
>                 Key: CASSANDRA-12937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Michael Semb Wever
>            Assignee: Claude Warren
>            Priority: Low
>              Labels: AdventCalendar2021, lhf
>             Fix For: 5.x
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to