[ 
https://issues.apache.org/jira/browse/CASSANDRA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735367#comment-14735367
 ] 

Benedict commented on CASSANDRA-10229:
--------------------------------------

This is actually not unexpected. Performance for small reads depends very much 
on the distribution of the data amongst the files. A major compaction causes 
the distribution to be affected, and certain kinds of read workload as a result 
will behave differently - specifically data that happens to be 
spatially/temporally proximal. A major compaction is the worst thing that can 
happen to a dataset of small partitions, for both legitimate and fake workloads.

This behaviour somewhat affects every workload we perform; differences in 
compaction candidate selection can result in notable performance differences. 
This is why I have a cstar ticket open for performing a major compaction, in 
order to normalise the distribution for all branches (even if it normalises to 
worst case). I've also previously raised the prospect of a special major 
compaction that attempts to distribute the data "normally" but consistently.

However, It occurs to me that the default stress parameters ported forwards 
from the legacy stress will exacerbate this, since ids are inserted 
sequentially, and the gaussian population distribution used by default for 
reads selects from the middle of this sequence, selecting values normally 
distributed around the mean. As a result the reads are likely to hit a subset 
of the sstables more frequently, and hence more frequently hit the page cache, 
than they will after a major compaction (once the records are spread randomly 
and evenly across the single sstable).

Stress should probably have a feature for shuffling the gaussian distribution, 
to mitigate this issue. However if you switch to a uniform read distribution, 
the problem should at least shrink (not necessarily disappear, since even with 
a truly random read workload the distribution on disk will affect page cache 
behaviour).


> bad read performance after a major compaction
> ---------------------------------------------
>
>                 Key: CASSANDRA-10229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10229
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>              Labels: perfomance
>         Attachments: users-caching.yaml
>
>
> I am trying to understand what I am seeing. My scenario is very basic, it's a 
> simple users table with key cache and row cache disabled. I write 50M then 
> read 5M random elements. The read performance is not that bad BEFORE a major 
> compaction of the data. I see a ~3x performance regression AFTER I run a 
> major compaction. 
> Here's the read performance numbers for my scenario: 
> {code}
> 3.0 before a major compaction (Key cache and row cache disabled), note that 
> this is the numbers from 50M,  I see the same with 5M
> ==================================================================================
> Results:
> op rate                   : 9149 [read:9149]
> partition rate            : 9149 [read:9149]
> row rate                  : 9149 [read:9149]
> latency mean              : 32.8 [read:32.8]
> latency median            : 31.2 [read:31.2]
> latency 95th percentile   : 47.2 [read:47.2]
> latency 99th percentile   : 55.0 [read:55.0]
> latency 99.9th percentile : 66.4 [read:66.4]
> latency max               : 305.4 [read:305.4]
> Total partitions          : 50000000 [read:50000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 01:31:05
> END
> -rw-rw-r-- 1 aboudreault aboudreault  4.7G Aug 31 08:51 ma-1024-big-Data.db
> -rw-rw-r-- 1 aboudreault aboudreault  4.9G Aug 31 09:08 ma-1077-big-Data.db
> 3.0 after a major compaction (Key cache and row cache disabled), note that 
> this is the numbers from 50M, I see the same with 5M
> ================================================================================
> Results:
> op rate                   : 3275 [read:3275]
> partition rate            : 3275 [read:3275]
> row rate                  : 3275 [read:3275]
> latency mean              : 91.6 [read:91.6]
> latency median            : 88.8 [read:88.8]
> latency 95th percentile   : 107.2 [read:107.2]
> latency 99th percentile   : 116.0 [read:116.0]
> latency 99.9th percentile : 125.5 [read:125.5]
> latency max               : 249.0 [read:249.0]
> Total partitions          : 50000000 [read:50000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 04:14:26
> END
> -rw-rw-r-- 1 aboudreault aboudreault 9.5G Aug 31 09:40 ma-1079-big-Data.db
> 2.1 before major compaction (Key cache and row cache disabled)
> ==============================================================
> Results:
> op rate                   : 21348 [read:21348]
> partition rate            : 21348 [read:21348]
> row rate                  : 21348 [read:21348]
> latency mean              : 14.1 [read:14.1]
> latency median            : 8.0 [read:8.0]
> latency 95th percentile   : 38.5 [read:38.5]
> latency 99th percentile   : 60.8 [read:60.8]
> latency 99.9th percentile : 99.2 [read:99.2]
> latency max               : 229.2 [read:229.2]
> Total partitions          : 5000000 [read:5000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:03:54
> END
> 2.1 after major compaction (Key cache and row cache disabled)
> =============================================================
> Results:
> op rate                   : 5262 [read:5262]
> partition rate            : 5262 [read:5262]
> row rate                  : 5262 [read:5262]
> latency mean              : 57.0 [read:57.0]
> latency median            : 55.5 [read:55.5]
> latency 95th percentile   : 69.4 [read:69.4]
> latency 99th percentile   : 83.3 [read:83.3]
> latency 99.9th percentile : 197.4 [read:197.4]
> latency max               : 1169.0 [read:1169.0]
> Total partitions          : 5000000 [read:5000000]
> Total errors              : 0 [read:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:15:50
> END
> {code}
> I can reproduce that read performance regression on EC2 and locally. To 
> reproduce:
> 1. Launch a 1 node cluster (2.1, 2.2 or 3.0)
> 2. Set the compaction thoughput at 0. (need a restart IIRC)
> 3. Write 50M elements (so we get the same sstable size for the test). The 
> yaml profile is attached in this ticket. Ensure you are using stress from 
> apache/cassandra-3.0, trunk is broken at the moment.
> {code}
> cassandra-stress user profile=`pwd`/users-caching.yaml ops\(insert=1\) n=50M 
> -rate threads=100
> {code}
> 4. Flush the data and wait for the auto-compaction to finish. You should get 
> around 2-6 sstables when it's done.
> 5. Restart Cassandra
> 6. Read 5M elements
> {code}
> cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M 
> -rate threads=300
> {code}
> 7. Restart C*, then start a major compaction and wait it's finish.
> {code}
> ccm stop && ccm start
> ccm nodetool compact
> {code}
> 8. Read 5M elements
> {code}
> cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M 
> -rate threads=300
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to