[ 
https://issues.apache.org/jira/browse/CASSANDRA-14605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14605:
-------------------------------------
    Attachment: sstable_reopen.svg

> Major compaction of LCS tables very slow
> ----------------------------------------
>
>                 Key: CASSANDRA-14605
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14605
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>         Environment: AWS, i3.4xlarge instance (very fast local nvme storage), 
> Linux 4.13
> Cassandra 3.0.16
>            Reporter: Joseph Lynch
>            Assignee: Benedict
>            Priority: Minor
>              Labels: lcs, performance
>         Attachments: slow_major_compaction_lcs.svg, sstable_reopen.svg
>
>
> We've recently started deploying 3.0.16 more heavily in production and today 
> I noticed that full compaction of LCS tables takes a much longer time than it 
> should. In particular it appears to be faster to convert a large dataset to 
> STCS, run full compaction, and then convert it to LCS (with re-leveling) than 
> it is to just run full compaction on LCS (with re-leveling).
> I was able to get a CPU flame graph showing 50% of the major compaction's cpu 
> time being spent in 
> [{{SSTableRewriter::maybeReopenEarly}}|https://github.com/apache/cassandra/blob/6ba2fb9395226491872b41312d978a169f36fcdb/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L184]
>  calling 
> [{{SSTableRewriter::moveStarts}}|https://github.com/apache/cassandra/blob/6ba2fb9395226491872b41312d978a169f36fcdb/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L223].
> I've attached the flame graph here which was generated by running Cassandra 
> using {{-XX:+PreserveFramePointer}}, then using jstack to get the compaction 
> native thread id (nid) which I then used perf to get on cpu time:
> {noformat}
> perf record -t <compaction thread> -o <output file> -F 49 -g sleep 60 
> >/dev/null
> {noformat}
> I took this data and collapsed it using the steps talked about in [Brendan 
> Gregg's java in flames 
> blogpost|https://medium.com/netflix-techblog/java-in-flames-e763b3d32166] 
> (Instructions section) to generate the graph.
> The results are that at least on this dataset (700GB of data compressed, 
> 2.2TB uncompressed), we are spending 50% of our cpu time in {{moveStarts}} 
> and I am unsure that we need to be doing that as frequently as we are. I'll 
> see if I can come up with a clean reproduction to confirm if it's a general 
> problem or just on this particular dataset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to