[jira] [Assigned] (CASSANDRA-12200) Backlogged compactions can make repair on trivially small tables waiting for a long time to finish

Jeff Jirsa (JIRA) Sat, 10 Sep 2016 14:27:07 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeff Jirsa reassigned CASSANDRA-12200:
--------------------------------------

    Assignee: Jeff Jirsa

> Backlogged compactions can make repair on trivially small tables waiting for 
> a long time to finish
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12200
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12200
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Wei Deng
>            Assignee: Jeff Jirsa
>
> In C* 3.0 we started to use incremental repair by default. However, this 
> seems to create a repair performance problem if you have a relatively 
> write-heavy workload that can drive all available concurrent_compactors to be 
> used by active compactions.
> I was able to demonstrate this issue by the following scenario:
> 1. On a three-node C* 3.0.7 cluster, use "cassandra-stress write n=100000000" 
> to generate 100GB of data with keyspace1.standard1 table using LCS (ctrl+c 
> the stress client once the data size on each node reaches 35+GB).
> 2. At this point, there will be hundreds of L0 SSTables waiting for LCS to 
> digest on each node, and with concurrent_compactors set to default at 2, the 
> two compaction threads are constantly busy processing the backlogged L0 
> SSTables.
> 3. Now create a new keyspace called "trivial_ks" with RF=3 and create a small 
> two-column CQL table in it, and insert 6 records.
> 4. Start a "nodetool repair trivial_ks" session on one of the nodes, and 
> watch the following behavior:
> {noformat}
> automaton@wdengdse50google-98425b985-3:~$ nodetool repair trivial_ks
> [2016-07-13 01:57:28,364] Starting repair command #1, repairing keyspace 
> trivial_ks with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 3)
> [2016-07-13 01:57:31,027] Repair session 27212dd0-489d-11e6-a6d6-cd06faa0aaa2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] finished (progress: 66%)
> [2016-07-13 02:07:47,637] Repair completed successfully
> [2016-07-13 02:07:47,657] Repair command #1 finished in 10 minutes 19 seconds
> {noformat}
> Basically for such a small table it took 10+ minutes to finish the repair. 
> Looking at debug.log for this particular repair session UUID, you will find 
> that all nodes were able to pass through validation compaction within 15ms, 
> but one of the nodes actually got stuck waiting for a compaction slot because 
> it has to do an anti-compaction step before it can finally tell the 
> initiating node that it's done with its part of the repair session, so it 
> took 10+ minutes for one compaction slot to be freed up, like shown in the 
> following debug.log entries:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
> RepairMessageVerbHandler.java:149 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> <snip>
> <...>
> DEBUG [CompactionExecutor:5] 2016-07-13 02:07:47,506  CompactionTask.java:217 
> - Compacted (286609e0-489d-11e6-9e03-1fd69c5ec46c) 32 sstables to 
> [/var/lib/cassandra/data/keyspace1/standard1-9c02e9c1487c11e6b9161dbd340a212f/mb-499-big,]
>  to level=0.  2,892,058,050 bytes to 2,874,333,820 (~99% of original) in 
> 616,880ms = 4.443617MB/s.  0 total partitions merged to 12,233,340.  
> Partition merge counts were {1:12086760, 2:146580, }
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
> CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest 
> on 
> 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
>  sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
> CompactionManager.java:540 - SSTable 
> BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
>  fully contained in range (-9223372036854775808,-9223372036854775808], 
> mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
> CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> Since validation compaction has its own threads outside of the regular 
> compaction thread pool restricted by concurrent_compactors, we were able to 
> pass through validation compaction without any issue. If we could treat 
> anti-compaction the same way (i.e. to give it its own thread pool), we can 
> avoid this kind of repair performance problem from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-12200) Backlogged compactions can make repair on trivially small tables waiting for a long time to finish

Reply via email to