[ 
https://issues.apache.org/jira/browse/CASSANDRA-12764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563347#comment-15563347
 ] 

Tom van der Woerdt commented on CASSANDRA-12764:
------------------------------------------------

Oh, nice to finally link the IRC name to the Jira name :)

Yes, it was a lot faster. Here's a graph showing what happened the last four 
days: https://i.imgur.com/AdWCCrR.png (graphing inode usage, divide by 8 for 
sstable count)

The red line is the node that started the mess. A botched repair[1] caused a 
nice 100k sstables. This was noticed, and cleaned up.

Sadly it had already synced those 100k sstables to other nodes, which properly 
started compacting the large amounts of files away. But then the regular 
automation jobs started a repair on the node I wiped, streaming all the files 
all over the place :( Sadly I was unaware of this until it was too late, and 
suddenly a lot of nodes on the cluster had 100k sstables :)

The sstable count was slowly going down (very, very slowly) but I figured I'd 
hop on IRC where [~jjirsa] and [~brandon.williams] helped find a workaround 
(the table move). I applied it to the most broken node first. On the graph it's 
the red line, look for the slope at the 10/10 boundary. This morning my script 
broke and it did the final sstables the slow route, but it finished and as you 
can see the scripted version is much faster than just letting compaction run. 
I'm in the progress of applying it to the two most broken nodes now, and will 
let the others just finish.

Anyway, that's the story of how this happened, which was totally my fault :) 
Now I'm just hoping that my mistake can lead to improvements in compaction 
performance.

Tom


[1]: subrange repair (similar to BrianGallew's code) on a LCS table, with 256 
vnodes, and most data not passing validation.

> Compaction performance issues with many sstables, during transaction commit 
> phase
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12764
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12764
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Tom van der Woerdt
>              Labels: lcs
>
> An issue with a script flooded my cluster with sstables. There is now a table 
> with 100k sstables, all on the order of KBytes, and it's taking a long time 
> (ETA 20 days) to compact, even though the table is only ~30GB.
> Stack trace :
> {noformat}
> "CompactionExecutor:308" #7541 daemon prio=1 os_prio=4 tid=0x00007fa22af35400 
> nid=0x41eb runnable [0x00007fdbea48d000]
>    java.lang.Thread.State: RUNNABLE
>       at java.util.TimSort.countRunAndMakeAscending(TimSort.java:360)
>       at java.util.TimSort.sort(TimSort.java:220)
>       at java.util.Arrays.sort(Arrays.java:1438)
>       at com.google.common.collect.Ordering.sortedCopy(Ordering.java:817)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:209)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
>       at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
>       at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:50)
>       at 
> org.apache.cassandra.db.lifecycle.SSTableIntervalTree.<init>(SSTableIntervalTree.java:40)
>       at 
> org.apache.cassandra.db.lifecycle.SSTableIntervalTree.build(SSTableIntervalTree.java:50)
>       at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:288)
>       at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:283)
>       at 
> com.google.common.base.Functions$FunctionComposition.apply(Functions.java:216)
>       at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:128)
>       at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:101)
>       at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:307)
>       at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:288)
>       at 
> org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:368)
>       at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
>       at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.doPrepare(CompactionAwareWriter.java:84)
>       at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
>       at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
>       at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.finish(CompactionAwareWriter.java:94)
>       at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:194)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
>       at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> IntervalTree shows in a lot of stack traces I've taken on several nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to