[
https://issues.apache.org/jira/browse/CASSANDRA-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-7552.
---------------------------------------
Resolution: Not a Problem
Assignee: (was: Yuki Morishita)
"LCS falls behind" is an expected condition under heavy write load. (Even STCS
can fall behind, but it will recover faster.)
> Compactions Pending build up when using LCS
> -------------------------------------------
>
> Key: CASSANDRA-7552
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7552
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Darla Baker
>
> We seem to be hitting an issue with LeveledCompactionStrategy while running
> performance tests on a 4 node cassandra installation. We are currently using
> Cassandra 2.0.7.
> In summary, we run a tests consisting of approximatively, 8000 inserts/sec,
> 16,000 gets/sec, and 8,000 deletes/sec. We have a grace period of 12 hours on
> our column families.
> At this rate, we observe a stable pending compaction tasks for about 22 to 26
> hours. After that period, something happens and the pending compaction tasks
> starts to increase rapidly, sometimes on one or two servers, but sometimes on
> all four of them. This goes on until the uncompacted SStables start consuming
> all the disk space, after which the cassandra cluster generally fails.
> When this occurs, the Compaction completed tasks rate is usually reducing
> over time, which seems to indicate that it takes more and more time to run
> the existing compaction tasks.
> At different occasions, I can reproduce a similar issue in less than 12
> hours. While the traffic rate remains constant, we seem to be hitting this at
> various intervals. Yesterday I could reproduce in less than 6 hours.
> We have two different deployments on which we have tested this issue:
> 1. 4x IBM HS22, using RAMDISK as cassandra data directory (thus eliminating
> disk I/O)
> 2. 8x IBM HS23, with SSD disks, deployed in two "geo-redundant" data centers
> of 4 nodes each, and a latency of 50ms between the data centers.
> I can reproduce the "compaction tasks falling behind" on both these setup,
> although they could be occurring for different reasons. Because of #1, I do
> not believe we are hitting an I/O bottleneck just yet.
> As an additional interesting node, if I artificially pause the traffic when I
> see the pending compaction task issue occurring, then:
> 1. The pending compaction tasks obviously stops to increase, but stay at the
> same number for 15 minutes (as if nothing is running).
> 2. The completed compaction tasks falls to 0 for 15 minutes
> 3. After 15 to 20 minutes, out of the blue, all compaction completes in less
> than 2 minutes.
> If I restart the traffic after that, the system is stable for a few hours,
> but the issue always comes back.
> We have written a small test tool that reproduce our application's Cassandra
> interaction.
> We have not successfully run a test for more than 30 hours under load, and
> every failure after that time would follow a similar pattern.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)