[
https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Branson updated CASSANDRA-8463:
------------------------------------
Summary: Constant compaction under LCS (was: Upgrading 2.0 to 2.1 causes
LCS to recompact all files)
> Constant compaction under LCS
> -----------------------------
>
> Key: CASSANDRA-8463
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8463
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded),
> 144G RAM, solid-state storage.
> Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65.
> Heap is 32G total, 4G newsize.
> 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5
> memtable_cleanup_threshold
> concurrent_compactors: 20
> Reporter: Rick Branson
> Assignee: Marcus Eriksson
> Fix For: 2.1.3
>
> Attachments: 0001-better-logging.patch, log-for-8463.txt
>
>
> It appears that tables configured with LCS will completely re-compact
> themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11
> -> 2.1.2, specifically). It starts out with <10 pending tasks for an hour or
> so, then starts building up, now with 50-100 tasks pending across the cluster
> after 12 hours. These nodes are under heavy write load, but were easily able
> to keep up in 2.0 (they rarely had >5 pending compaction tasks), so I don't
> think it's LCS in 2.1 actually being worse, just perhaps some different LCS
> behavior that causes the layout of tables from 2.0 to prompt the compactor to
> reorganize them?
> The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB
> SSTables due to the improved memtable setup in 2.1. Before I upgraded the
> entire cluster to 2.1, I noticed the problem and tried several variations on
> the flush size, thinking perhaps the larger tables in L0 were causing some
> kind of cascading compactions. Even if they're sized roughly like the 2.0
> flushes were, same behavior occurs. I also tried both enabling & disabling
> STCS in L0 with no real change other than L0 began to back up faster, so I
> left the STCS in L0 enabled.
> Tables are configured with 32MB sstable_size_in_mb, which was found to be an
> improvement on the 160MB table size for compaction performance. Maybe this is
> wrong now? Otherwise, the tables are configured with defaults. Compaction has
> been unthrottled to help them catch-up. The compaction threads stay very
> busy, with the cluster-wide CPU at 45% "nice" time. No nodes have completely
> caught up yet. I'll update JIRA with status about their progress if anything
> interesting happens.
> From a node around 12 hours ago, around an hour after the upgrade, with 19
> pending compaction tasks:
> SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0]
> SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0]
> SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0]
> Recently, with 41 pending compaction tasks:
> SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0]
> SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0]
> SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0]
> SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0]
> SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0]
> SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0]
> More information about the use case: writes are roughly uniform across these
> tables. The data is "sharded" across these 8 tables by key to improve
> compaction parallelism. Each node receives up to 75,000 writes/sec sustained
> at peak, and a small number of reads. This is a pre-production cluster that's
> being warmed up with new data, so the low volume of reads (~100/sec per node)
> is just from automatic sampled data checks, otherwise we'd just use STCS :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)