[
https://issues.apache.org/jira/browse/CASSANDRA-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011341#comment-15011341
]
Ariel Weisberg commented on CASSANDRA-10326:
--------------------------------------------
I re-ran [Benedict's
workload|http://cstar.datastax.com/graph?command=one_job&stats=2082790c-8caf-11e5-b2c0-0256e416528f&metric=99.9th_latency&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=957.22&ymin=0&ymax=96.25]
against the released 3.0.0 and 2.2.3. 3.0.0 did quite well being the same or
faster/better in latency and throughput than 2.2.3 for 3 for three of the
workloads.
The summary 99.9th percentile numbers on 1_user is odd with a latency spike in
the last 100 seconds that causes the overall number to come out worse. Until
that point 3.0.0 warms up a little slower than 2.2.3, but doesn't slow down
quite as quickly.
For 2_user 3.0.0 is consistently a hair slower than 2.2.3, but 99.9th
percentile latency is almost the same.
For 3_user 3.0.0 has significantly higher throughput and slightly better 99.9th
percentile latency.
For 4_user 3.0.0 has significantly higher throughput and significantly better
99.9th percentile latency.
I am not totally comfortable with the peak throughput of stress client relative
to the potential throughput of a 3 node cluster. When I was running stress on
my desktop I found that it saturated four cores, and that is more heavyweight
than I would like from a benchmark client. I will run with a mocked out server
to find out what the peak throughput of the client is on the cluster so we can
have some idea of when we are approaching it.
> Performance is worse in 3.0
> ---------------------------
>
> Key: CASSANDRA-10326
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10326
> Project: Cassandra
> Issue Type: Bug
> Reporter: Benedict
> Assignee: Ariel Weisberg
> Fix For: 3.0.x
>
>
> Performance is generally turning out to be worse after 8099, despite a number
> of unrelated performance enhancements being delivered. This isn't entirely
> unexpected, given a great deal of time was spent optimising the old code,
> however things appear worse than we had hoped.
> My expectation was that workloads making extensive use of CQL constructs
> would be faster post-8099, however the latest tests performed with very large
> CQL rows, including use of collections, still exhibit performance below that
> of 2.1 and 2.2.
> Eventually, as the dataset size grows large enough and the locality of access
> is just right, the reduction in size of our dataset will yield a window
> during which some users will perform better due simply to improved page cache
> hit rates. We seem to see this in some of the tests. However we should be at
> least as fast (and really faster) off the bat.
> The following are some large partition benchmark results, with as many as 40K
> rows per partition, running LCS. There are a number of parameters we can
> modify to see how behaviour changes and under what scenarios we might still
> be faster, but the picture painted isn't brilliant, and is consistent, so we
> should really try and figure out what's up before GA.
> [trades-with-flags (collections),
> blade11b|http://cstar.datastax.com/graph?stats=f0a17292-5a13-11e5-847a-42010af0688f&metric=op_rate&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=4387.02&ymin=0&ymax=122951.4]
> [trades-with-flags (collections),
> blade11|http://cstar.datastax.com/graph?stats=e25aaaa0-5a13-11e5-ae0d-42010af0688f&metric=op_rate&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=4424.75&ymin=0&ymax=130158.6]
> [trades (no collections),
> blade11|http://cstar.datastax.com/graph?stats=9b7da48e-570c-11e5-90fe-42010af0688f&metric=op_rate&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=2682.46&ymin=0&ymax=142547.9]
> [~slebresne]: will you have time to look into this before GA?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)