[jira] [Updated] (CASSANDRA-10229) Fix cassandra-stress gaussian behaviour for shuffling the distribution, to mitigate read perf after a major compaction

C. Scott Andreas (JIRA) Sun, 18 Nov 2018 21:41:51 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


C. Scott Andreas updated CASSANDRA-10229:
-----------------------------------------
    Component/s: Stress

> Fix cassandra-stress gaussian behaviour for shuffling the distribution, to 
> mitigate read perf after a major compaction
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10229
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Stress
>            Reporter: Alan Boudreault
>            Priority: Minor
>              Labels: perfomance, stress
>         Attachments: users-caching.yaml
>
>
> TITLE WAS: BAD READ PERFORMANCE AFTER A MAJOR COMPACTION
> I am trying to understand what I am seeing. My scenario is very basic, it's a 
> simple users table with key cache and row cache disabled. I write 50M then 
> read 5M random elements. The read performance is not that bad BEFORE a major 
> compaction of the data. I see a ~3x performance regression AFTER I run a 
> major compaction. 
> Here's the read performance numbers for my scenario: 
> {code}
> 3.0 before a major compaction (Key cache and row cache disabled), note that 
> this is the numbers from 50M,  I see the same with 5M
> ==================================================================================
> Results:
> op rate                   : 9149 [read:9149]
> partition rate            : 9149 [read:9149]
> row rate                  : 9149 [read:9149]
> latency mean              : 32.8 [read:32.8]
> latency median            : 31.2 [read:31.2]
> latency 95th percentile   : 47.2 [read:47.2]
> latency 99th percentile   : 55.0 [read:55.0]
> latency 99.9th percentile : 66.4 [read:66.4]
> latency max               : 305.4 [read:305.4]
> Total partitions          : 50000000 [read:50000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 01:31:05
> END
> -rw-rw-r-- 1 aboudreault aboudreault  4.7G Aug 31 08:51 ma-1024-big-Data.db
> -rw-rw-r-- 1 aboudreault aboudreault  4.9G Aug 31 09:08 ma-1077-big-Data.db
> 3.0 after a major compaction (Key cache and row cache disabled), note that 
> this is the numbers from 50M, I see the same with 5M
> ================================================================================
> Results:
> op rate                   : 3275 [read:3275]
> partition rate            : 3275 [read:3275]
> row rate                  : 3275 [read:3275]
> latency mean              : 91.6 [read:91.6]
> latency median            : 88.8 [read:88.8]
> latency 95th percentile   : 107.2 [read:107.2]
> latency 99th percentile   : 116.0 [read:116.0]
> latency 99.9th percentile : 125.5 [read:125.5]
> latency max               : 249.0 [read:249.0]
> Total partitions          : 50000000 [read:50000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 04:14:26
> END
> -rw-rw-r-- 1 aboudreault aboudreault 9.5G Aug 31 09:40 ma-1079-big-Data.db
> 2.1 before major compaction (Key cache and row cache disabled)
> ==============================================================
> Results:
> op rate                   : 21348 [read:21348]
> partition rate            : 21348 [read:21348]
> row rate                  : 21348 [read:21348]
> latency mean              : 14.1 [read:14.1]
> latency median            : 8.0 [read:8.0]
> latency 95th percentile   : 38.5 [read:38.5]
> latency 99th percentile   : 60.8 [read:60.8]
> latency 99.9th percentile : 99.2 [read:99.2]
> latency max               : 229.2 [read:229.2]
> Total partitions          : 5000000 [read:5000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:03:54
> END
> 2.1 after major compaction (Key cache and row cache disabled)
> =============================================================
> Results:
> op rate                   : 5262 [read:5262]
> partition rate            : 5262 [read:5262]
> row rate                  : 5262 [read:5262]
> latency mean              : 57.0 [read:57.0]
> latency median            : 55.5 [read:55.5]
> latency 95th percentile   : 69.4 [read:69.4]
> latency 99th percentile   : 83.3 [read:83.3]
> latency 99.9th percentile : 197.4 [read:197.4]
> latency max               : 1169.0 [read:1169.0]
> Total partitions          : 5000000 [read:5000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:15:50
> END
> {code}
> I can reproduce that read performance regression on EC2 and locally. To 
> reproduce:
> 1. Launch a 1 node cluster (2.1, 2.2 or 3.0)
> 2. Set the compaction thoughput at 0. (need a restart IIRC)
> 3. Write 50M elements (so we get the same sstable size for the test). The 
> yaml profile is attached in this ticket. Ensure you are using stress from 
> apache/cassandra-3.0, trunk is broken at the moment.
> {code}
> cassandra-stress user profile=`pwd`/users-caching.yaml ops\(insert=1\) n=50M 
> -rate threads=100
> {code}
> 4. Flush the data and wait for the auto-compaction to finish. You should get 
> around 2-6 sstables when it's done.
> 5. Restart Cassandra
> 6. Read 5M elements
> {code}
> cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M 
> -rate threads=300
> {code}
> 7. Restart C*, then start a major compaction and wait it's finish.
> {code}
> ccm stop && ccm start
> ccm nodetool compact
> {code}
> 8. Read 5M elements
> {code}
> cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M 
> -rate threads=300
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-10229) Fix cassandra-stress gaussian behaviour for shuffling the distribution, to mitigate read perf after a major compaction

Reply via email to