[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203751#comment-14203751
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
----------------------------------------------

Here is another extreme (but, unfortunately, real) example of LCS going a bit 
crazy.

{code}
# nodetool cfstats myks.mytable
Keyspace: myks
        Read Count: 3006212
        Read Latency: 21.02595119106703 ms.
        Write Count: 11226340
        Write Latency: 1.8405579886231844 ms.
        Pending Tasks: 0
                Table: wm_contacts
                SSTable count: 6530
                SSTables in each level: [2369/4, 10, 104/100, 1043/1000, 3004, 
0, 0, 0, 0]
                Space used (live), bytes: 1113384288740
                Space used (total), bytes: 1113406795020
                SSTable Compression Ratio: 0.3307170610260717
                Number of keys (estimate): 26294144
                Memtable cell count: 782994
                Memtable data size, bytes: 213472460
                Memtable switch count: 3493
                Local read count: 3006239
                Local read latency: 21.026 ms
                Local write count: 11226517
                Local write latency: 1.841 ms
                Pending tasks: 0
                Bloom filter false positives: 41835779
                Bloom filter false ratio: 0.97500
                Bloom filter space used, bytes: 19666944
                Compacted partition minimum bytes: 104
                Compacted partition maximum bytes: 3379391
                Compacted partition mean bytes: 139451
                Average live cells per slice (last five minutes): 444.0
                Average tombstones per slice (last five minutes): 0.0
{code}

{code}
# nodetool compactionstats
pending tasks: 190
          compaction type        keyspace           table       completed       
    total      unit  progress
               Compaction      myks        mytable2      7198353690      
7446734394     bytes    96.66%
               Compaction      myks     mytable     4851429651     10717052513  
   bytes    45.27%
Active compaction remaining time :   0h00m04s
{code}


Note the cfstats. The number of sstables at L0 is insane. Yet, C* is sitting 
quietly compacting the data using 2 cores out of 32.

Once it gets into this state I immediately start seeing large sstables forming  
- instead of 256Mb the sstables of 1-2Gb and more start appearing. And it 
creates the snowball effect.



> LCS compaction low performance, many pending compactions, nodes are almost 
> idle
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7949
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: DSE 4.5.1-1, Cassandra 2.0.8
>            Reporter: Nikolai Grigoriev
>         Attachments: iostats.txt, nodetool_compactionstats.txt, 
> nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt
>
>
> I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
> 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
> load similar to the load in our future product. Before running the simulator 
> I had to pre-generate enough data. This was done using Java code and DataStax 
> Java driver. To avoid going deep into details, two tables have been 
> generated. Each table currently has about 55M rows and between few dozens and 
> few thousands of columns in each row.
> This data generation process was generating massive amount of non-overlapping 
> data. Thus, the activity was write-only and highly parallel. This is not the 
> type of the traffic that the system will have ultimately to deal with, it 
> will be mix of reads and updates to the existing data in the future. This is 
> just to explain the choice of LCS, not mentioning the expensive SSD disk 
> space.
> At some point while generating the data I have noticed that the compactions 
> started to pile up. I knew that I was overloading the cluster but I still 
> wanted the genration test to complete. I was expecting to give the cluster 
> enough time to finish the pending compactions and get ready for real traffic.
> However, after the storm of write requests have been stopped I have noticed 
> that the number of pending compactions remained constant (and even climbed up 
> a little bit) on all nodes. After trying to tune some parameters (like 
> setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
> the nodes were compacting one of the CFs in a single stream using virtually 
> no CPU and no disk I/O. This process was taking hours. After that it would be 
> followed by a short burst of few dozens of compactions running in parallel 
> (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
> many hours doing one compaction at time. So it looks like this:
> # nodetool compactionstats
> pending tasks: 3351
>           compaction type        keyspace           table       completed     
>       total      unit  progress
>                Compaction      myks     table_list1     66499295588   
> 1910515889913     bytes     3.48%
> Active compaction remaining time :        n/a
> # df -h
> ...
> /dev/sdb        1.5T  637G  854G  43% /cassandra-data/disk1
> /dev/sdc        1.5T  425G  1.1T  29% /cassandra-data/disk2
> /dev/sdd        1.5T  429G  1.1T  29% /cassandra-data/disk3
> # find . -name **table_list1**Data** | grep -v snapshot | wc -l
> 1310
> Among these files I see:
> 1043 files of 161Mb (my sstable size is 160Mb)
> 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
> 263 files of various sized - between few dozens of Kb and 160Mb
> I've been running the heavy load for about 1,5days and it's been close to 3 
> days after that and the number of pending compactions does not go down.
> I have applied one of the not-so-obvious recommendations to disable 
> multithreaded compactions and that seems to be helping a bit - I see some 
> nodes started to have fewer pending compactions. About half of the cluster, 
> in fact. But even there I see they are sitting idle most of the time lazily 
> compacting in one stream with CPU at ~140% and occasionally doing the bursts 
> of compaction work for few minutes.
> I am wondering if this is really a bug or something in the LCS logic that 
> would manifest itself only in such an edge case scenario where I have loaded 
> lots of unique data quickly.
> By the way, I see this pattern only for one of two tables - the one that has 
> about 4 times more data than another (space-wise, number of rows is the 
> same). Looks like all these pending compactions are really only for that 
> larger table.
> I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to