[jira] [Updated] (CASSANDRA-12071) Regression in flushing throughput under load after CASSANDRA-6696

Ariel Weisberg (JIRA) Wed, 22 Jun 2016 13:50:57 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ariel Weisberg updated CASSANDRA-12071:
---------------------------------------
    Description: 
The way flushing used to work is that a ColumnFamilyStore could have multiple 
Memtables flushing at once and multiple ColumnFamilyStores could flush at the 
same time. The way it works now there can be only a single flush of any 
ColumnFamilyStore & Memtable running in the C* process, and the number of 
threads applied to that flush is bounded by the number of disks in JBOD.

This works ok most of the time but occasionally flushing will be a little 
slower and ingest will outstrip it and then block on available memory. At this 
point you see several second stalls that cause timeouts.

This is a problem for reasonable configurations that don't use JBOD but have 
access to a fast disk that can handle some IO queuing (RAID, SSD).

You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, SSD) 
if you unthrottle compaction or set it to something like 64 megabytes/second 
and run with 8 compaction threads and stress with the default write workload 
and a reasonable number of threads. I tested with 96.

It started happening after about 60 gigabytes of data was loaded.

  was:
The way flushing used to work is that a ColumnFamilyStore could have multiple 
memtables flushing at once. The way it works now there can be only a single 
flush of any memtable running in the C* process, and the number of threads 
applied to that flush is bounded by the number of disks in JBOD.

This works ok most of the time but occasionally flushing will be a little 
slower and ingest will outstrip it and then block on available memory. At this 
point you see several second stalls that cause timeouts.

This is a problem for reasonable configurations that don't use JBOD but have 
access to a fast disk that can handle some IO queuing (RAID, SSD).

You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, SSD) 
if you unthrottle compaction or set it to something like 64 megabytes/second 
and run with 8 compaction threads and stress with the default write workload 
and a reasonable number of threads. I tested with 96.

It started happening after about 60 gigabytes of data was loaded.


> Regression in flushing throughput under load after CASSANDRA-6696
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-12071
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12071
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Ariel Weisberg
>
> The way flushing used to work is that a ColumnFamilyStore could have multiple 
> Memtables flushing at once and multiple ColumnFamilyStores could flush at the 
> same time. The way it works now there can be only a single flush of any 
> ColumnFamilyStore & Memtable running in the C* process, and the number of 
> threads applied to that flush is bounded by the number of disks in JBOD.
> This works ok most of the time but occasionally flushing will be a little 
> slower and ingest will outstrip it and then block on available memory. At 
> this point you see several second stalls that cause timeouts.
> This is a problem for reasonable configurations that don't use JBOD but have 
> access to a fast disk that can handle some IO queuing (RAID, SSD).
> You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, 
> SSD) if you unthrottle compaction or set it to something like 64 
> megabytes/second and run with 8 compaction threads and stress with the 
> default write workload and a reasonable number of threads. I tested with 96.
> It started happening after about 60 gigabytes of data was loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12071) Regression in flushing throughput under load after CASSANDRA-6696

Reply via email to