[ 
https://issues.apache.org/jira/browse/CASSANDRA-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-7552.
---------------------------------------
    Resolution: Not a Problem
      Assignee:     (was: Yuki Morishita)

"LCS falls behind" is an expected condition under heavy write load.  (Even STCS 
can fall behind, but it will recover faster.)

> Compactions Pending build up when using LCS
> -------------------------------------------
>
>                 Key: CASSANDRA-7552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7552
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Darla Baker
>
> We seem to be hitting an issue with LeveledCompactionStrategy while running 
> performance tests on a 4 node cassandra installation. We are currently using 
> Cassandra 2.0.7.
> In summary, we run a tests consisting of approximatively, 8000 inserts/sec, 
> 16,000 gets/sec, and 8,000 deletes/sec. We have a grace period of 12 hours on 
> our column families.
> At this rate, we observe a stable pending compaction tasks for about 22 to 26 
> hours. After that period, something happens and the pending compaction tasks 
> starts to increase rapidly, sometimes on one or two servers, but sometimes on 
> all four of them. This goes on until the uncompacted SStables start consuming 
> all the disk space, after which the cassandra cluster generally fails.
> When this occurs, the Compaction completed tasks rate is usually reducing 
> over time, which seems to indicate that it takes more and more time to run 
> the existing compaction tasks.
> At different occasions, I can reproduce a similar issue in less than 12 
> hours. While the traffic rate remains constant, we seem to be hitting this at 
> various intervals. Yesterday I could reproduce in less than 6 hours.
> We have two different deployments on which we have tested this issue: 
> 1. 4x IBM HS22, using RAMDISK as cassandra data directory (thus eliminating 
> disk I/O) 
> 2. 8x IBM HS23, with SSD disks, deployed in two "geo-redundant" data centers 
> of 4 nodes each, and a latency of 50ms between the data centers.
> I can reproduce the "compaction tasks falling behind" on both these setup, 
> although they could be occurring for different reasons. Because of #1, I do 
> not believe we are hitting an I/O bottleneck just yet.
> As an additional interesting node, if I artificially pause the traffic when I 
> see the pending compaction task issue occurring, then: 
> 1. The pending compaction tasks obviously stops to increase, but stay at the 
> same number for 15 minutes (as if nothing is running). 
> 2. The completed compaction tasks falls to 0 for 15 minutes 
> 3. After 15 to 20 minutes, out of the blue, all compaction completes in less 
> than 2 minutes.
> If I restart the traffic after that, the system is stable for a few hours, 
> but the issue always comes back.
> We have written a small test tool that reproduce our application's Cassandra 
> interaction.
> We have not successfully run a test for more than 30 hours under load, and 
> every failure after that time would follow a similar pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to