[
https://issues.apache.org/jira/browse/CASSANDRA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985821#comment-13985821
]
Joshua McKenzie commented on CASSANDRA-6890:
--------------------------------------------
Running cql3 native, snappy compression, and mixed load w/1-to-1 ratio looks to
have normalized a lot of the performance differential I was seeing. Using
linux mmap as a baseline:
Raw op/s:
| |windows buffered|windows mmap|linux buffered|linux mmap|
| 4 threadCount|2236|2171|1953|2111|
| 8 threadCount|4716|4673|3955|4300|
| 16 threadCount|7605|7529|6795|7465|
| 24 threadCount|8662|9231|8341|8819|
| 36 threadCount|13907|13147|13237|14451|
| 54 threadCount|24039|24817|24177|26073|
| 81 threadCount|39016|43673|34154|40929|
|121 threadCount|40494|49513|42658|48313|
|181 threadCount|53189|53039|49691|52885|
|271 threadCount|53447|55354|54842|58779|
|406 threadCount|54853|54295|60108|64675|
|609 threadCount|60067|56145|61823|70885|
|913 threadCount|57333|58483|60763|70398|
% Comparison:
| | windows buffered|windows mmap|linux buffered|linux mmap|
| 4 threadCount|105.92%|102.84%|92.52%|100.00%|
| 8 threadCount|109.67%|108.67%|91.98%|100.00%|
| 16 threadCount|101.88%|100.86%|91.02%|100.00%|
| 24 threadCount|98.22%|104.67%|94.58%|100.00%|
| 36 threadCount|96.24%|90.98%|91.60%|100.00%|
| 54 threadCount|92.20%|95.18%|92.73%|100.00%|
| 81 threadCount|95.33%|106.70%|83.45%|100.00%|
|121 threadCount|83.82%|102.48%|88.30%|100.00%|
|181 threadCount|100.57%|100.29%|93.96%|100.00%|
|271 threadCount|90.93%|94.17%|93.30%|100.00%|
|406 threadCount|84.81%|83.95%|92.94%|100.00%|
|609 threadCount|84.74%|79.21%|87.22%|100.00%|
|913 threadCount|81.44%|83.07%|86.31%|100.00%|
As Benedict indicated, and in-process page cache should make the debate between
these two paths moot. The results above are quite close to the 10% threshold
you've indicated Jonathan; I'd be comfortable normalizing the system on
buffered I/O leading up to 3.0 to give us a single read path to migrate to an
in-process page cache. I certainly don't see a need for us to keep the mmap'ed
path on Windows as there doesn't appear to be a performance differential when
using a more representative work-load on cql3.
As an aside, do we have a documented set of suggestions on how people should
approach stress-testing Cassandra, or perhaps a set of performance regression
testing we do against releases? Nothing beats specialized expertise in tuning
the stress workload to your expected usage patterns but it might help to give
people a baseline and a starting point for their own testing.
Pavel: I did record perf runs from both buffered and memory-mapped performance
on linux, but given how close the results above are I don't know how much value
we'll be able to pull from them. I can attach them to the ticket if you're
still interested.
> Standardize on a single read path
> ---------------------------------
>
> Key: CASSANDRA-6890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6890
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Joshua McKenzie
> Assignee: Joshua McKenzie
> Labels: performance
> Fix For: 3.0
>
> Attachments: mmap_gc.jpg, mmap_jstat.txt, mmap_perf.txt,
> nommap_gc.jpg, nommap_jstat.txt
>
>
> Since we actively unmap unreferenced SSTR's and also copy data out of those
> readers on the read path, the current memory mapped i/o is a lot of
> complexity for very little payoff. Clean out the mmapp'ed i/o on the read
> path.
--
This message was sent by Atlassian JIRA
(v6.2#6252)