[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966431#comment-16966431
 ] 

Joey Lynch commented on CASSANDRA-15379:
----------------------------------------

Alright, I made it so that Zstd, Deflate and LZ4HC (which compresses extremely 
slowly) now flush with LZ4 (fast) controlled via an EnumSet. Since I'm changing 
the ICompressor interface I figured it is more maintainable this way than 
having a somewhat arbitrary boolean switch.

I also took the opportunity to add some more tests and improve the 
documentation as well. I tried to add some helpful documentation to help people 
pick compressors (I hear a lot of confusion about why we have Snappy and 
Deflate still around, so I tried to clarify in the documentation). I'll squash 
after review comments are integrated.

||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]|
|[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379.png?circle-token=
 
1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379]|

The one failing unit test appears to be 
org.apache.cassandra.config.DatabaseDescriptorRefTest, which I thought was 
supposed to be fixed as part of CASSANDRA-15371, I'll double check tomorrow.

> Make it possible to flush with a different compression strategy than we 
> compact with
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15379
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/Config, Local/Memtable
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression       : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 16
>   flush_compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to