[ 
https://issues.apache.org/jira/browse/CASSANDRA-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-20918:
-------------------------------------------
    Fix Version/s: 5.x
                       (was: 6.x)

> Add cursor-based low allocation optimized compaction implementation
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-20918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Josh McKenzie
>            Assignee: Nitsan Wakart
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 7_100m_100kr_100r.png, compact_after_t1.zip, 
> compact_after_t1_profiles.zip, compact_before_t1.zip
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Compaction does a ton of allocation and burns a lot of CPU in the process; we 
> can move away from allocation with some fairly simple and straightforward 
> reusable objects and infrastructure that make use of that, reducing 
> allocation and thus CPU usage during compaction. Heap allocation on all 
> test-cases holds steady at 20MB while regular compaction grows up past 5+GB.
> This patch introduces a collection of reusable objects:
>  * Reusable key/token implementations for Murmur and Local partitioners
>  * ReusableLivenessInfo
>  * ReusableDeleteTime
>  * Partition/Clustering/UnfilteredDescriptor
> And new compaction structures that make use of those objects:
>  * CursorCompactionPipeline
>  * CursorCompactor
>  * SSTableCursorReader
>  * SSTableCursorWriter
> There's quite a bit of test code added, benchmarks, etc on the linked branch.
> ~13k added, 405 lines deleted
> ~8.3k lines delta are non-test code
> ~5k lines delta are test code
> Attaching a screenshot of the "messiest" benchmark case with mixed size rows 
> and full merge; across various data and compaction mixes the highlight is 
> that compaction as implemented here is roughly 3-5x faster in most scenarios 
> and uses 20mb on heap vs. multiple GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to