[ 
https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-8463:
--------------------------------------
    Fix Version/s: 2.1.3

> Upgrading 2.0 to 2.1 causes LCS to recompact all files
> ------------------------------------------------------
>
>                 Key: CASSANDRA-8463
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 
> 144G RAM, solid-state storage.
> Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65.
> Heap is 32G total, 4G newsize.
> 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 
> memtable_cleanup_threshold
> concurrent_compactors: 20
>            Reporter: Rick Branson
>            Assignee: Marcus Eriksson
>             Fix For: 2.1.3
>
>
> It appears that tables configured with LCS will completely re-compact 
> themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 
> -> 2.1.2, specifically). It starts out with <10 pending tasks for an hour or 
> so, then starts building up, now with 50-100 tasks pending across the cluster 
> after 12 hours. These nodes are under heavy write load, but were easily able 
> to keep up in 2.0 (they rarely had >5 pending compaction tasks), so I don't 
> think it's LCS in 2.1 actually being worse, just perhaps some different LCS 
> behavior that causes the layout of tables from 2.0 to prompt the compactor to 
> reorganize them?
> The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB 
> SSTables due to the improved memtable setup in 2.1. Before I upgraded the 
> entire cluster to 2.1, I noticed the problem and tried several variations on 
> the flush size, thinking perhaps the larger tables in L0 were causing some 
> kind of cascading compactions. Even if they're sized roughly like the 2.0 
> flushes were, same behavior occurs. I also tried both enabling & disabling 
> STCS in L0 with no real change other than L0 began to back up faster, so I 
> left the STCS in L0 enabled.
> Tables are configured with 32MB sstable_size_in_mb, which was found to be an 
> improvement on the 160MB table size for compaction performance. Maybe this is 
> wrong now? Otherwise, the tables are configured with defaults. Compaction has 
> been unthrottled to help them catch-up. The compaction threads stay very 
> busy, with the cluster-wide CPU at 45% "nice" time. No nodes have completely 
> caught up yet. I'll update JIRA with status about their progress if anything 
> interesting happens.
> From a node around 12 hours ago, around an hour after the upgrade, with 19 
> pending compaction tasks:
> SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0]
> SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0]
> SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0]
> Recently, with 41 pending compaction tasks:
> SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0]
> SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0]
> SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0]
> SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0]
> SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0]
> More information about the use case: writes are roughly uniform across these 
> tables. The data is "sharded" across these 8 tables by key to improve 
> compaction parallelism. Each node receives up to 75,000 writes/sec sustained 
> at peak, and a small number of reads. This is a pre-production cluster that's 
> being warmed up with new data, so the low volume of reads (~100/sec per node) 
> is just from automatic sampled data checks, otherwise we'd just use STCS :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to