[ https://issues.apache.org/jira/browse/CASSANDRA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
C. Scott Andreas updated CASSANDRA-10229: ----------------------------------------- Component/s: Stress > Fix cassandra-stress gaussian behaviour for shuffling the distribution, to > mitigate read perf after a major compaction > ---------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-10229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10229 > Project: Cassandra > Issue Type: Improvement > Components: Stress > Reporter: Alan Boudreault > Priority: Minor > Labels: perfomance, stress > Attachments: users-caching.yaml > > > TITLE WAS: BAD READ PERFORMANCE AFTER A MAJOR COMPACTION > I am trying to understand what I am seeing. My scenario is very basic, it's a > simple users table with key cache and row cache disabled. I write 50M then > read 5M random elements. The read performance is not that bad BEFORE a major > compaction of the data. I see a ~3x performance regression AFTER I run a > major compaction. > Here's the read performance numbers for my scenario: > {code} > 3.0 before a major compaction (Key cache and row cache disabled), note that > this is the numbers from 50M, I see the same with 5M > ================================================================================== > Results: > op rate : 9149 [read:9149] > partition rate : 9149 [read:9149] > row rate : 9149 [read:9149] > latency mean : 32.8 [read:32.8] > latency median : 31.2 [read:31.2] > latency 95th percentile : 47.2 [read:47.2] > latency 99th percentile : 55.0 [read:55.0] > latency 99.9th percentile : 66.4 [read:66.4] > latency max : 305.4 [read:305.4] > Total partitions : 50000000 [read:50000000] > Total errors : 0 [read:0] > total gc count : 0 > total gc mb : 0 > total gc time (s) : 0 > avg gc time(ms) : NaN > stdev gc time(ms) : 0 > Total operation time : 01:31:05 > END > -rw-rw-r-- 1 aboudreault aboudreault 4.7G Aug 31 08:51 ma-1024-big-Data.db > -rw-rw-r-- 1 aboudreault aboudreault 4.9G Aug 31 09:08 ma-1077-big-Data.db > 3.0 after a major compaction (Key cache and row cache disabled), note that > this is the numbers from 50M, I see the same with 5M > ================================================================================ > Results: > op rate : 3275 [read:3275] > partition rate : 3275 [read:3275] > row rate : 3275 [read:3275] > latency mean : 91.6 [read:91.6] > latency median : 88.8 [read:88.8] > latency 95th percentile : 107.2 [read:107.2] > latency 99th percentile : 116.0 [read:116.0] > latency 99.9th percentile : 125.5 [read:125.5] > latency max : 249.0 [read:249.0] > Total partitions : 50000000 [read:50000000] > Total errors : 0 [read:0] > total gc count : 0 > total gc mb : 0 > total gc time (s) : 0 > avg gc time(ms) : NaN > stdev gc time(ms) : 0 > Total operation time : 04:14:26 > END > -rw-rw-r-- 1 aboudreault aboudreault 9.5G Aug 31 09:40 ma-1079-big-Data.db > 2.1 before major compaction (Key cache and row cache disabled) > ============================================================== > Results: > op rate : 21348 [read:21348] > partition rate : 21348 [read:21348] > row rate : 21348 [read:21348] > latency mean : 14.1 [read:14.1] > latency median : 8.0 [read:8.0] > latency 95th percentile : 38.5 [read:38.5] > latency 99th percentile : 60.8 [read:60.8] > latency 99.9th percentile : 99.2 [read:99.2] > latency max : 229.2 [read:229.2] > Total partitions : 5000000 [read:5000000] > Total errors : 0 [read:0] > total gc count : 0 > total gc mb : 0 > total gc time (s) : 0 > avg gc time(ms) : NaN > stdev gc time(ms) : 0 > Total operation time : 00:03:54 > END > 2.1 after major compaction (Key cache and row cache disabled) > ============================================================= > Results: > op rate : 5262 [read:5262] > partition rate : 5262 [read:5262] > row rate : 5262 [read:5262] > latency mean : 57.0 [read:57.0] > latency median : 55.5 [read:55.5] > latency 95th percentile : 69.4 [read:69.4] > latency 99th percentile : 83.3 [read:83.3] > latency 99.9th percentile : 197.4 [read:197.4] > latency max : 1169.0 [read:1169.0] > Total partitions : 5000000 [read:5000000] > Total errors : 0 [read:0] > total gc count : 0 > total gc mb : 0 > total gc time (s) : 0 > avg gc time(ms) : NaN > stdev gc time(ms) : 0 > Total operation time : 00:15:50 > END > {code} > I can reproduce that read performance regression on EC2 and locally. To > reproduce: > 1. Launch a 1 node cluster (2.1, 2.2 or 3.0) > 2. Set the compaction thoughput at 0. (need a restart IIRC) > 3. Write 50M elements (so we get the same sstable size for the test). The > yaml profile is attached in this ticket. Ensure you are using stress from > apache/cassandra-3.0, trunk is broken at the moment. > {code} > cassandra-stress user profile=`pwd`/users-caching.yaml ops\(insert=1\) n=50M > -rate threads=100 > {code} > 4. Flush the data and wait for the auto-compaction to finish. You should get > around 2-6 sstables when it's done. > 5. Restart Cassandra > 6. Read 5M elements > {code} > cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M > -rate threads=300 > {code} > 7. Restart C*, then start a major compaction and wait it's finish. > {code} > ccm stop && ccm start > ccm nodetool compact > {code} > 8. Read 5M elements > {code} > cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M > -rate threads=300 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org