[
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630928#comment-14630928
]
Stefania commented on CASSANDRA-8894:
-------------------------------------
[~benedict] I went ahead and implemented the latest suggested optimization in
this commit
[here|https://github.com/stef1927/cassandra/commit/ad6712cdc12380ef0529a13ed6e9bd1c5cecebad].
I've also attached tentative stress yaml profiles, which I intend to run like
this:
{code}
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml
ops\(insert=1,\) n=100000 -rate threads=50
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml
ops\(singleblob=1,\) n=100000 -rate threads=50
{code}
Can you confirm the profiles are what you intended, basically a partition id
and a blob column with the size distributed as you previously indicated. I'm
not sure if there is anything else I should do to ensure reads mostly hit disk
- other than spreading the partition id across a bit interval?
I created these additional branches:
- trunk-pre-8099
- 8894-pre-8099
- 8894-pre-8099-first-optim
- 8894-first-optim
The names are self describing except for "first-optim" which means before
implementing the latest optimization. A tag would have been enough but cstar
perf does not support it.
Unfortunately cstar perf has been giving me more problems other than tags, cc
[~enigmacurry]:
* The old trunk branches pre 8099 fail due to the schema tables changes
(http://cstar.datastax.com/tests/id/e134ee7e-2c46-11e5-a180-42010af0688f) :
"InvalidQueryException: Keyspace system_schema does not exist". However I think
if we fake version 2.2 in build.xml we should be OK.
* The new branches either fail because of a nodetool failure
(http://cstar.datastax.com/tests/id/86abc144-2c55-11e5-87b9-42010af0688f) or
the graphs are wrong
(http://cstar.datastax.com/tests/id/11fe9c5a-2c45-11e5-9760-42010af0688f).
Here is the nodetool failure:
{code}
[10.200.241.104] Executing task 'ensure_running'
[10.200.241.104] run: JAVA_HOME=~/fab/jvms/jdk1.8.0_45
~/fab/cassandra/bin/nodetool ring
[10.200.241.104] out: error: null
[10.200.241.104] out: -- StackTrace --
[10.200.241.104] out: java.util.NoSuchElementException
[10.200.241.104] out: at
com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:506)
[10.200.241.104] out: at
com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:494)
[10.200.241.104] out: at
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
[10.200.241.104] out: at java.util.Collections.max(Collections.java:708)
[10.200.241.104] out: at
org.apache.cassandra.tools.nodetool.Ring.execute(Ring.java:63)
[10.200.241.104] out: at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:240)
[10.200.241.104] out: at
org.apache.cassandra.tools.NodeTool.main(NodeTool.java:154)
[10.200.241.104] out:
[10.200.241.104] out:
{code}
I'll resume the performance tests once cstar perf is stable again.
> Our default buffer size for (uncompressed) buffered reads should be smaller,
> and based on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Labels: benedict-to-commit
> Fix For: 3.x
>
> Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
>
>
> A large contributor to slower buffered reads than mmapped is likely that we
> read a full 64Kb at once, when average record sizes may be as low as 140
> bytes on our stress tests. The TLB has only 128 entries on a modern core, and
> each read will touch 32 of these, meaning we are unlikely to almost ever be
> hitting the TLB, and will be incurring at least 30 unnecessary misses each
> time (as well as the other costs of larger than necessary accesses). When
> working with an SSD there is little to no benefit reading more than 4Kb at
> once, and in either case reading more data than we need is wasteful. So, I
> propose selecting a buffer size that is the next larger power of 2 than our
> average record size (with a minimum of 4Kb), so that we expect to read in one
> operation. I also propose that we create a pool of these buffers up-front,
> and that we ensure they are all exactly aligned to a virtual page, so that
> the source and target operations each touch exactly one virtual page per 4Kb
> of expected record size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)