Joey Lynch created CASSANDRA-15379:
--------------------------------------

             Summary: Make it possible to flush with a different compression 
strategy than we compact with
                 Key: CASSANDRA-15379
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
             Project: Cassandra
          Issue Type: Improvement
          Components: Local/Compaction, Local/Config, Local/Memtable
            Reporter: Joey Lynch
            Assignee: Joey Lynch


[~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
some of our most dense clusters and have been observing close to 50% reduction 
in footprint with Zstd on some of our workloads! Unfortunately though we have 
been running into an issue where the flush might take so long (Zstd is slower 
to compress than LZ4) that we can actually block the next flush and cause 
instability.

Internally we are working around this with a very simple patch which flushes 
SSTables as the default compression strategy (LZ4) regardless of the table 
params. This is a simple solution but I think the ideal solution though might 
be for the flush compression strategy to be configurable separately from the 
table compression strategy (while defaulting to the same thing). Instead of 
adding yet another compression option to the yaml (like hints and commitlog) I 
was thinking of just adding it to the table parameters and then adding a 
{{default_table_parameters}} yaml option like:
{noformat}

# Default table properties to apply on freshly created tables. The currently 
supported defaults are:
# * compression       : How are SSTables compressed in general (flush, 
compaction, etc ...)
# * flush_compression : How are SSTables compressed as they flush
# supported
default_table_parameters:
  compression:
    class_name: 'LZ4Compressor'
    parameters:
      chunk_length_in_kb: 16
  flush_compression:
    class_name: 'LZ4Compressor'
    parameters:
      chunk_length_in_kb: 4
{noformat}

This would have the nice effect as well of giving our configuration a path 
forward to providing user specified defaults for table creation (so e.g. if a 
particular user wanted to use a different default chunk_length_in_kb they can 
do that).

So the proposed (~mandatory) scope is:
* Flush with a faster compression strategy

I'd like to implement the following at the same time:
* Per table flush compression configuration
* Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to