[ 
https://issues.apache.org/jira/browse/CASSANDRA-20158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936597#comment-17936597
 ] 

Ariel Weisberg commented on CASSANDRA-20158:
--------------------------------------------

I wanted binary search to be able to find the values by identity to aid in 
removal/replacement and enforcing no duplicates. It's not required if you build 
the entire tree which is what other things do and they are allowed to have 
duplicates.

I'm waiting on approval, then I will post a branch that shows how I integrated 
your changes. Even when taking the faster path to build the tree it still 
maintains a sorted min order and max order array which is slow, but it 
leverages binary search so most of the work is just straight memory copies 
which should have good performance in comparison to the sorting and repeated 
comparisons.

> IntervalTree should support copyAndReplace for checkpoint when ranges are 
> unchanged
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20158
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Yuqi Yan
>            Assignee: Yuqi Yan
>            Priority: Normal
>             Fix For: 4.1.x
>
>         Attachments: image-2024-12-20-02-39-53-420.png, 
> image-2024-12-20-02-41-06-544.png, image-2024-12-20-02-42-52-003.png
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We observed very slow compaction and sometimes stuck memtable flushing hence 
> caused write latency spikes when the cluster has large number of SSTables 
> (~20K), similar to what was observed in CASSANDRA-19596.
> Looking deeper into when these interval tree is rebuilt - there is actually 
> no need to do rebuild all the time for checkpoint() calls.
>  
> updateLiveSet(toUpdate, staged.update) this is updating the current version 
> of SSTableReader within the View. However the update isn't always changing 
> the ranges of the SSTableReader (low, high).
>  
> One Example:
>  * IndexSummaryRedistribution.adjustSamplingLevels()
>  ** SSTableReader replacement = sstable.cloneWithNewSummarySamplingLevel(cfs, 
> entry.newSamplingLevel);
>  ** This is changing the Metadata only and the ranges are unchanged
>  
> Considering this, rebuilding the entire IntervalTree will not be required, 
> instead IntervalTree should support replacing these SSTableReader.
>  
> If we're rebuilding the tree, complexity is O(n(logn)^2) in current trunk as 
> we're repeating the O(nlogn) sort on every node creation, after 
> CASSANDRA-19596 this will be O(nlogn), but with update supported, some of the 
> updateLiveSet calls can be optimized to O(m(logn)^2) where m is the number of 
> SSTableReaders we attempt to replace, which we have m << n (number of 
> SSTables) in most cases.
>  
> This is achieved by
>  # finding the node containing the SSTable (logn)
>  # binary search and replacing the SSTableReader from the node (logn)
>  # To support CAS update, 1 and 2 need to done by copying the path and 
> re-create the affected nodes on the path
>  
> The experiment I did was on a 2 rings setup, one on 4.1+CASSANDRA-19596 
> (marked as trunk), and the other on  4.1+CASSANDRA-19596+this patch (marked 
> as new). ~15K SSTables (with LCS, sstable size was 50MB, single_uplevel 
> enabled). stress-test with 1:1 rw ratio.
> Result shows that ~15% of the checkpoint calls don't necessarily need to 
> rebuild the tree.
> !image-2024-12-20-02-41-06-544.png|width=803,height=133!
> Compaction throughput (scanner read throughput) was increased from 130MB/s to 
> 200MB/s
> !image-2024-12-20-02-39-53-420.png|width=1378,height=136!
> Checkpoint finish time reduced from mean ~1.5s to ~800ms
> !image-2024-12-20-02-42-52-003.png|width=1100,height=186!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to