[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089243#comment-17089243
 ] 

Joey Lynch commented on CASSANDRA-15379:
----------------------------------------

*Zstd Write Mostly Read Rarely Benchmark*:

In this test I configured Zstd the way we do in production for our write mostly 
read rarely (e.g. trace) datasets where Zstd really shines at getting the 
footprint down significantly (up to 50% in some cases). This benchmark 
simulates our production workloads for Zstd most accurately so far.
 * Load pattern: 3.6K wps and 1.2k rps at LOCAL_ONE consistency with a  random 
load pattern.
 * Data sizing: ~50 million partitions with 2 rows each of 10 columns, total 
size per partition of about 4 KiB of random data. ~300 GiB per node data size 
(replicated 6 ways)
 * Compaction settings: STCS with min=8, max=32
 * Compression: Zstd level 10 with 256 KiB block size

{noformat}
compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '8'}
compression = {'chunk_length_in_kb': '256', 'class': 
'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': '10'}
{noformat}
*Zstd Write Mostly Read Rarely Benchmark Results*:

The candidate branch did significantly better in all aspects. Most importantly, 
the baseline cluster started falling infinitely behind and queueing/dropping 
mutations while candidate deferred the expensive work to compaction. 
Flamegraphs confirmed that the vast majority of our flusher thread on-cpu time 
was spent in zstd compression. Some data to support this conclusion:
 [^15379_request_queueing_zstd_level10.png]
 [^15379_message_drops_zstd_level10.png]
 [^15379_coordinator_zstd_level10.png]
 [^15379_flush_flamegraph_zstd_level10.png]
 [^15379_concurrent_flushes_zstd_level10.png]
 [^15379_backfill_duration_zstd_level10.png]
 [^15379_backfill_drops_zstd_level10.png]
 [^15379_backfill_queueing_zstd_level10.png]
 [^15379_backfill_zstd_level10.png]

This data clearly shows that baseline using zstd on the flush was so slow at 
flushing that it was unstable, like we observed in production at Netflix. The 
candidate version that flushed the data in LZ4 and then amortized the expensive 
compression to the compaction instead fared significantly better and remained 
relatively stable.

> Make it possible to flush with a different compression strategy than we 
> compact with
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15379
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/Config, Local/Memtable
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>         Attachments: 15379_backfill_drops_zstd_level10.png, 
> 15379_backfill_duration_zstd_level10.png, 
> 15379_backfill_queueing_zstd_level10.png, 15379_backfill_zstd_level10.png, 
> 15379_baseline_flush_trace.png, 15379_candidate_flush_trace.png, 
> 15379_concurrent_flushes_zstd_level10.png, 15379_coordinator_defaults.png, 
> 15379_coordinator_zstd_defaults.png, 15379_coordinator_zstd_level10.png, 
> 15379_flush_flamegraph_zstd_level10.png, 
> 15379_message_drops_zstd_level10.png, 15379_replica_defaults.png, 
> 15379_replica_zstd_defaults.png, 15379_request_queueing_zstd_level10.png, 
> 15379_system_defaults.png, 15379_system_zstd_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression       : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 16
>   flush_compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to