[ 
https://issues.apache.org/jira/browse/CASSANDRA-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11599:
------------------------------------
    Description: 
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going 
on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile 
(which will lock WrappingCompactionStrategy when calling 
LevelManifest.getCandidatesFor), but failing in the end, since those sstable 
candidates intersects with what is being compacted by 1st thread. From a series 
of thread dump, I noticed the thread that is doing work always get blocked by 
other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.

  was:
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going 
on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.


> When there are a large number of small repaired L0 sstables, compaction is 
> very slow
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11599
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 2.1.13
>            Reporter: Ruoran Wang
>
> This is on 6 node 2.1.13 cluster with leveled compaction strategy. This 
> happens when/after running incremental repair, like '/usr/bin/nodetool repair 
> -pr -par -local -inc -- KEYSPACE'
>  
> Initially, I found missing metrics when there is heavy compaction going 
> on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
> WrappingCompactionStrategy is blocked. 
> Then I saw a case where compaction got stucked (progress moves dramatically 
> slow). There are 29k sstables after inc repair where I noticed tons of 
> sstables are only 200+ Bytes just containing 1 key. Also because of 
> WrappingCompactionStrategy is blocked.
>  
> My guess is, with 8 compaction_executors and a tons of small repaired L0 
> sstables, the first thread is able to get some (likely 32) sstables to 
> compact. If this task contains a large range of tokens, the following 7 
> thread will iterate through the sstabels trying to find what can be fixed in 
> the meanwhile (which will lock WrappingCompactionStrategy when calling 
> LevelManifest.getCandidatesFor), but failing in the end, since those sstable 
> candidates intersects with what is being compacted by 1st thread. From a 
> series of thread dump, I noticed the thread that is doing work always get 
> blocked by other 7 threads.
>  
> 1. I tried to separate an inc-repair into 4 token ranges, which helped 
> keeping the sstables count down. That seems to be working.
> 2. Another fix I tried is, replace ageSortedSSTables with a new method 
> "keyCountSortedSSTables", which small sstables are returned first. (at 
> org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there 
> will be 32 very small sstables, the following condition won't be met 
> ('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
> compaction will merge those 32 very small sstables. This will help to prevent 
> the first compaction job to be working on a set of sstables that covers a 
> wide range.
> I can provide more info if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to