[ 
https://issues.apache.org/jira/browse/CASSANDRA-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492105#comment-17492105
 ] 

Paulo Motta commented on CASSANDRA-17381:
-----------------------------------------

Hi [~gimhana.ds], I think Joey added some starting instructions in his previous 
comment:

> Warm up tasks include pulling the branch, rebasing it against 3.0, getting it 
> to compile if there are issues, and starting up a local Cassandra node with a 
> table configured to use the new compaction strategy with DEBUG logging on to 
> observe the choices. 

> Produce and verify BoundedReadCompactionStrategy as a unified general purpose 
> compaction algorithm
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17381
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17381
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>              Labels: gsoc, gsoc2022
>
> The existing compaction strategies have a number of drawbacks that make all 
> three unsuitable as a general use compaction strategy, for example STCS 
> creates giant files that are hard to back up, mess with read performance and 
> the page cache, and led to many of the early re-open bugs. LCS improved 
> dramatically on this but also has various issues e.g. lack of performant full 
> compaction or due to the strict leveling with e.g. bulk loading when writes 
> exceed the rate we can do the L0 - L1 promotion.
> In this 
> [talk|https://github.com/ngcc/ngcc2019/blob/master/NextGenerationCassandraCompactionGoingBeyondLCS.pdf]
>  I proposed a novel compaction strategy that aims to expose a single tunable 
> that the user can control for the read amplification. Raise the 
> min_threshold_levels and you tradeoff read/space performance for write 
> performance. Since then a proof of concept [patch 
> |https://github.com/jolynch/cassandra/tree/jolynch_bounded_read_final]has 
> been published along with some rudimentary [documentation 
> |https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e] but this 
> is still not tracked in Jira.
> The remaining work here is
> 1. Validate the algorithm is correct via test suites and performance testing 
> stress testing and benchmarking with OSS tools (e.g. cassandra-stress, 
> [tlp-stress|https://github.com/thelastpickle/tlp-stress], or 
> [ndbench|https://github.com/Netflix/ndbench]). When issues are found (there 
> likely will be issues as the patch is a PoC), devise how to adjust the 
> algorithm and implementation appropriately. Key metric of success is we can 
> run Cassandra stably for more than 24 hours while applying sustained load, 
> with minimal compaction load (and also compaction can keep up).
> 2. Do more in depth experiments measuring performance across a wide range of 
> workloads (e.g. write heavy, read heavy, balanced, time series, register 
> update, etc ...) and in comparison with LCS (leveled), STCS (size tiered), 
> and TWCS (time window). Key metrics of success are establishing that as we 
> tune max_read_per_read we should get more predictable read latency under low 
> system load (ρ < 30%) while not degrading at high system load (ρ > 70%), and 
> we should match LCS performance under low load while doing better at high 
> load.
> Once this is validated a Cassandra blog post reporting on the findings 
> (positive or negative) may be advisable.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to