[
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631108#comment-14631108
]
Benedict commented on CASSANDRA-8894:
-------------------------------------
A few comments on the stress testing:
* The blob_id population doesn't need to be constrained (it defaults to
something like 1..100B)
* To perform the inserts, we want to ensure we construct a dataset large enough
to spill to disk, i.e. we want to probably insert at least 100M items (perhaps
200M+) if they're only ~50 bytes each.
* We probably want to run with slightly more threads, say 300
The graphs don't appear to actually be broken that were produced: the stress
run was simply extremely brief, since it only operated over 100K items :)
At risk of sounding like a broken record to everyone, it can help to use K, M,
B syntax for your numbers in the profile/command line.
> Our default buffer size for (uncompressed) buffered reads should be smaller,
> and based on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Labels: benedict-to-commit
> Fix For: 3.x
>
> Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
>
>
> A large contributor to slower buffered reads than mmapped is likely that we
> read a full 64Kb at once, when average record sizes may be as low as 140
> bytes on our stress tests. The TLB has only 128 entries on a modern core, and
> each read will touch 32 of these, meaning we are unlikely to almost ever be
> hitting the TLB, and will be incurring at least 30 unnecessary misses each
> time (as well as the other costs of larger than necessary accesses). When
> working with an SSD there is little to no benefit reading more than 4Kb at
> once, and in either case reading more data than we need is wasteful. So, I
> propose selecting a buffer size that is the next larger power of 2 than our
> average record size (with a minimum of 4Kb), so that we expect to read in one
> operation. I also propose that we create a pool of these buffers up-front,
> and that we ensure they are all exactly aligned to a virtual page, so that
> the source and target operations each touch exactly one virtual page per 4Kb
> of expected record size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)