[ 
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8894:
--------------------------------
    Description: A large contributor to slower buffered reads than mmapped is 
likely that we read a full 64Kb at once, when average record sizes may be as 
low as 140 bytes on our stress tests. The TLB has only 128 entries on a modern 
core, and each read will touch 32 of these, meaning we are unlikely to almost 
ever be hitting the TLB, and will be incurring at least 30 unnecessary misses 
each time (as well as the other costs of larger than necessary accesses). When 
working with an SSD there is little to no benefit reading more than 4Kb at 
once, and in either case reading more data than we need is wasteful. So, I 
propose selecting a buffer size that is the next larger power of 2 than our 
average record size (with a minimum of 4Kb), so that we expect to read in one 
operation. I also propose that we create a pool of these buffers up-front, and 
that we ensure they are all exactly aligned to a virtual page, so that the 
source and target operations each touch exactly one virtual page per 4Kb of 
expected record size.  (was: A large contributor to slower buffered reads than 
mmapped is likely that we read a full 64Kb at once, when average record sizes 
may be as low as 140 bytes on our stress tests. The TLB has only 128 entries on 
a modern core, and each read will touch 16 of these, meaning we are unlikely to 
almost ever be hitting the TLB, and will be incurring at least 15 unnecessary 
misses each time (as well as the other costs of larger than necessary 
accesses). When working with an SSD there is little to no benefit reading more 
than 4Kb at once, and in either case reading more data than we need is 
wasteful. So, I propose selecting a buffer size that is the next larger power 
of 2 than our average record size (with a minimum of 4Kb), so that we expect to 
read in one operation. I also propose that we create a pool of these buffers 
up-front, and that we ensure they are all exactly aligned to a virtual page, so 
that the source and target operations each touch exactly one virtual page per 
4Kb of expected record size.)

> Our default buffer size for (uncompressed) buffered reads should be smaller, 
> and based on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 3.0
>
>
> A large contributor to slower buffered reads than mmapped is likely that we 
> read a full 64Kb at once, when average record sizes may be as low as 140 
> bytes on our stress tests. The TLB has only 128 entries on a modern core, and 
> each read will touch 32 of these, meaning we are unlikely to almost ever be 
> hitting the TLB, and will be incurring at least 30 unnecessary misses each 
> time (as well as the other costs of larger than necessary accesses). When 
> working with an SSD there is little to no benefit reading more than 4Kb at 
> once, and in either case reading more data than we need is wasteful. So, I 
> propose selecting a buffer size that is the next larger power of 2 than our 
> average record size (with a minimum of 4Kb), so that we expect to read in one 
> operation. I also propose that we create a pool of these buffers up-front, 
> and that we ensure they are all exactly aligned to a virtual page, so that 
> the source and target operations each touch exactly one virtual page per 4Kb 
> of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to