[
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945607#comment-13945607
]
Pavel Yaskevich commented on CASSANDRA-6746:
--------------------------------------------
bq. In practice, moving the WILLNEED into the getSegment() call is dangerous as
the segment is used past the initial 64Kb, and if we rely on ourselves only for
read-ahead this could result in very substandard performance for larger rows.
We also probably want to only WILLNEED the actual size of the buffer we expect
to read for compressed files.
Yes, this is only PoC to see if the scheme works for platters. Just a couple of
things, for the optimal performance we need an information from the index about
the size of the row, so we can mark SEQUENTIAL a). whole row if the row is less
then indexing threshold, b). portions of the row on the index boundaries.
Original 1 page WILLNEED (very conservative) is used to make sure that read can
quickly grab the first portion of the buffer while extended read-ahead
prefetches everything else. This still works for the big rows because we are
forced to read the header of the row first (key at least) and then when we
seek() to the position indicated by column index and we want to hint that we
are going to read for the portion of the row, so large rows are suffering more
from the fact that we have to over-buffer then WILLNEED. I wish we could have
useful mmap'ed buffer implementation, so madvice as such as we do fadvice would
no longer be required...
There is a way to solve cold cache problem from the parts of the data from
original SSTables that have been read before, I did some work with mincore()
previously and can revisit if needed. The problem we are trying to solve with
dropping the cache for memtable and compacted SSTables (in memory restricted
and/or slow I/O systems) is keeping page cache for the old files creates more
jitter and slows down warmup of the newly created SSTable.
> Reads have a slow ramp up in speed
> ----------------------------------
>
> Key: CASSANDRA-6746
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Ryan McGuire
> Assignee: Benedict
> Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png,
> 6746-patched.png, 6746.blockdev_setra.full.png,
> 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz,
> 6746.buffered_io_tweaks.write-flush-compact-mixed.png,
> 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt,
> buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2,
> cassandra-2.1-bdplab-trial-fincore.tar.bz2
>
>
> On a physical four node cluister I am doing a big write and then a big read.
> The read takes a long time to ramp up to respectable speeds.
> !2.1_vs_2.0_read.png!
> [See data
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
--
This message was sent by Atlassian JIRA
(v6.2#6252)