[jira] [Commented] (CASSANDRA-18773) Compactions are slow

Cameron Zemek (Jira) Mon, 28 Aug 2023 22:05:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759820#comment-17759820
 ]


Cameron Zemek commented on CASSANDRA-18773:
-------------------------------------------

[^18773.patch]

 

I took your idea above and implemented a preserveOrder method onto 
MergeIterator which CompactionIterator implementation will disable when there 
is no index.
{code:java}
INFO  [CompactionExecutor:2] 2023-08-28 22:19:37,162 CompactionTask.java:239 - 
Read=53.93% 7.03 MiB/s, Write=20.47% 7.31 MiB/s
INFO  [CompactionExecutor:2] 2023-08-28 22:20:37,162 CompactionTask.java:239 - 
Read=54.94% 6.97 MiB/s, Write=20.42% 7.24 MiB/s
INFO  [CompactionExecutor:2] 2023-08-28 22:21:37,162 CompactionTask.java:239 - 
Read=53.69% 6.82 MiB/s, Write=22.33% 7.08 MiB/s {code}
Which results in basically same results as my proof of concept.

 

[~blambov] what do you think about using background threads in compactions (to 
decouple read/write)? As that change also results in noticeable increase (40%) 
to:
{noformat}
INFO  [CompactionExecutor:2] 2023-08-28 21:08:08,463 CompactionTask.java:266 - 
Read=37.27% 9.63 MiB/s, Write=28.22% 10 MiB/s
INFO  [CompactionExecutor:2] 2023-08-28 21:09:08,463 CompactionTask.java:266 - 
Read=37.93% 9.65 MiB/s, Write=27.87% 10.02 MiB/s{noformat}
This does copying of the rows into memory to pass across to the writer, so the 
reader can progress its file positions. Eg.
{code:java}
        ArrayList<Unfiltered> rows = new ArrayList<>();
        while (rowIterator.hasNext())
        {
            rows.add(rowIterator.next());
        }{code}
So there is a tradeoff.

> Compactions are slow
> --------------------
>
>                 Key: CASSANDRA-18773
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18773
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction
>            Reporter: Cameron Zemek
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: 18773.patch, compact-poc.patch, flamegraph.png, 
> stress.yaml
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have noticed that compactions involving a lot of sstables are very slow 
> (for example major compactions). I have attached a cassandra stress profile 
> that can generate such a dataset under ccm. In my local test I have 2567 
> sstables at 4Mb each.
> I added code to track wall clock time of various parts of the code. One 
> problematic part is ManyToOne constructor. Tracing through the code for every 
> partition creating a ManyToOne for all the sstable iterators for each 
> partition. In my local test get a measy 60Kb/sec read speed, and bottlenecked 
> on single core CPU (since this code is single threaded) with it spending 85% 
> of the wall clock time in ManyToOne constructor.
> As another datapoint to show its the merge iterator part of the code using 
> the cfstats from [https://github.com/instaclustr/cassandra-sstable-tools/] 
> which reads all the sstables but does no merging gets 26Mb/sec read speed.
> Tracking back from ManyToOne call I see this in 
> UnfilteredPartitionIterators::merge
> {code:java}
>                 for (int i = 0; i < toMerge.size(); i++)
>                 {
>                     if (toMerge.get(i) == null)
>                     {
>                         if (null == empty)
>                             empty = EmptyIterators.unfilteredRow(metadata, 
> partitionKey, isReverseOrder);
>                         toMerge.set(i, empty);
>                     }
>                 }
>  {code}
> Not sure what purpose of creating these empty rows are. But on a whim I 
> removed all these empty iterators before passing to ManyToOne and then all 
> the wall clock time shifted to CompactionIterator::hasNext() and read speed 
> increased to 1.5Mb/s.
> So there are further bottlenecks in this code path it seems, but the first is 
> this ManyToOne and having to build it for every partition read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18773) Compactions are slow

Reply via email to