[ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345295#comment-14345295 ]
Jonathan Ellis edited comment on CASSANDRA-8894 at 3/3/15 4:35 PM: ------------------------------------------------------------------- bq. I propose selecting a buffer size that is the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we expect to read in one operation. Makes sense to me. bq. I also propose that we create a pool of these buffers up-front Sharing buffers across files is tricky because of the internals of RandomAccessReader. Maybe this should be a separate ticket. was (Author: jbellis): bq. I propose selecting a buffer size that is the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we expect to read in one operation. Makes sense to me. > I also propose that we create a pool of these buffers up-front Sharing buffers across files is tricky because of the internals of RandomAccessReader. Maybe this should be a separate ticket. > Our default buffer size for (uncompressed) buffered reads should be smaller, > and based on the expected record size > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-8894 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 3.0 > > > A large contributor to slower buffered reads than mmapped is likely that we > read a full 64Kb at once, when average record sizes may be as low as 140 > bytes on our stress tests. The TLB has only 128 entries on a modern core, and > each read will touch 32 of these, meaning we are unlikely to almost ever be > hitting the TLB, and will be incurring at least 30 unnecessary misses each > time (as well as the other costs of larger than necessary accesses). When > working with an SSD there is little to no benefit reading more than 4Kb at > once, and in either case reading more data than we need is wasteful. So, I > propose selecting a buffer size that is the next larger power of 2 than our > average record size (with a minimum of 4Kb), so that we expect to read in one > operation. I also propose that we create a pool of these buffers up-front, > and that we ensure they are all exactly aligned to a virtual page, so that > the source and target operations each touch exactly one virtual page per 4Kb > of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)