[
https://issues.apache.org/jira/browse/CASSANDRA-20158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936608#comment-17936608
]
Ariel Weisberg edited comment on CASSANDRA-20158 at 3/18/25 7:20 PM:
---------------------------------------------------------------------
[This is the branch I was referring
to.|https://github.com/aweisberg/cassandra/commit/c0ecf42612f69597c292869897e7dbb707daaa8c]
was (Author: aweisberg):
This is the branch I was referring to
https://github.com/aweisberg/cassandra/commit/c0ecf42612f69597c292869897e7dbb707daaa8c
> IntervalTree should support copyAndReplace for checkpoint when ranges are
> unchanged
> -----------------------------------------------------------------------------------
>
> Key: CASSANDRA-20158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20158
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Compaction, Local/SSTable
> Reporter: Yuqi Yan
> Assignee: Yuqi Yan
> Priority: Normal
> Fix For: 4.1.x
>
> Attachments: image-2024-12-20-02-39-53-420.png,
> image-2024-12-20-02-41-06-544.png, image-2024-12-20-02-42-52-003.png
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> We observed very slow compaction and sometimes stuck memtable flushing hence
> caused write latency spikes when the cluster has large number of SSTables
> (~20K), similar to what was observed in CASSANDRA-19596.
> Looking deeper into when these interval tree is rebuilt - there is actually
> no need to do rebuild all the time for checkpoint() calls.
>
> updateLiveSet(toUpdate, staged.update) this is updating the current version
> of SSTableReader within the View. However the update isn't always changing
> the ranges of the SSTableReader (low, high).
>
> One Example:
> * IndexSummaryRedistribution.adjustSamplingLevels()
> ** SSTableReader replacement = sstable.cloneWithNewSummarySamplingLevel(cfs,
> entry.newSamplingLevel);
> ** This is changing the Metadata only and the ranges are unchanged
>
> Considering this, rebuilding the entire IntervalTree will not be required,
> instead IntervalTree should support replacing these SSTableReader.
>
> If we're rebuilding the tree, complexity is O(n(logn)^2) in current trunk as
> we're repeating the O(nlogn) sort on every node creation, after
> CASSANDRA-19596 this will be O(nlogn), but with update supported, some of the
> updateLiveSet calls can be optimized to O(m(logn)^2) where m is the number of
> SSTableReaders we attempt to replace, which we have m << n (number of
> SSTables) in most cases.
>
> This is achieved by
> # finding the node containing the SSTable (logn)
> # binary search and replacing the SSTableReader from the node (logn)
> # To support CAS update, 1 and 2 need to done by copying the path and
> re-create the affected nodes on the path
>
> The experiment I did was on a 2 rings setup, one on 4.1+CASSANDRA-19596
> (marked as trunk), and the other on 4.1+CASSANDRA-19596+this patch (marked
> as new). ~15K SSTables (with LCS, sstable size was 50MB, single_uplevel
> enabled). stress-test with 1:1 rw ratio.
> Result shows that ~15% of the checkpoint calls don't necessarily need to
> rebuild the tree.
> !image-2024-12-20-02-41-06-544.png|width=803,height=133!
> Compaction throughput (scanner read throughput) was increased from 130MB/s to
> 200MB/s
> !image-2024-12-20-02-39-53-420.png|width=1378,height=136!
> Checkpoint finish time reduced from mean ~1.5s to ~800ms
> !image-2024-12-20-02-42-52-003.png|width=1100,height=186!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]