[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012867#comment-13012867
 ] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

bq. Documentation for posix_fadvice/madvice calls suggests to do more frequent 
little requests instead of big requests - kernel in the high possibility going 
to ignore an advice on the big region.

I suspect this comes from WILLNEED being intended as a read-ahead hint. The 
implementation puts a maximum cap on the amount of read-ahead it'll do:

  http://lxr.free-electrons.com/source/mm/fadvise.c#L95
  http://lxr.free-electrons.com/source/mm/readahead.c#L203
  http://lxr.free-electrons.com/source/mm/readahead.c#L236

My knowledge of the VM subsystem fails me but my interpretation of 
max_sane_readahead() is that it's trying to essentially cap based on available 
memory resources such that it may or may not do the read-ahead (or do it to a 
lesser extent) based on memory pressure.

This is different from e.g. DONTNEED which is fine to call for huge segments 
and seems to be O(n) (as per CASSANDRA-1470 discussion).

Btw, it's also possible to populate with an mmap() call with MAP_POPULATE 
(Linux-specific, but we're totally Linux specific anyway). I haven't checked 
but empirically I suspect this is not subject to limits on how much you 
populate - and the call is also blocking.

bq. Note that WILLNEED is non-blocking call.

Implementation:

  http://lxr.free-electrons.com/source/mm/readahead.c#L145

which boils down to a call to:

  http://lxr.free-electrons.com/source/mm/readahead.c#L109

So yes it's non-blocking, but you're still either going to take the random I/O 
(remember that as of CASSANDRA-1470 we know nothing will be in page cache) or 
else requests would be dropped somewhere (causing less hotness). If you're 
taking the I/O, again either the I/O is done in such a way as to essentially be 
serial or else it's done with some level of parallelism. Which one will affect 
the impact is has on other I/O.

bq. We can't stop using DONTNEED while writing compacted file because it will 
suck pages from sstables which are currently in use. And we do WILLNEED's only 
when we have SSTableReader for a compacted file ready - right before old 
sstables going to be replaced with new ones so this is not going to make a big 
performance impact on the reads. 

The whole process is going to be suboptimal yes since both the old and the new 
sstable will be cached. But note that even though the read-ahead is done at the 
point of the switch, the actual freeing of cached pages associated with the old 
sstable is still happening at some point in the future as a result of GC 
(right? or am I misreading?).

My thinking has been that something along the lines of CASSANDRA-1608 and 
capping sstable sizes may helt mitigate these effects at some point in the 
future. Meanwhile, the only way I see to avoid double caching is to DONTNEED 
the old table first, but that has the obvious effects on live traffic.

There is definitely the trade-off here and the timing is different in the 
WILLNEED-just-before-switch case, maybe leading to more data being retained 
hot, but at the cost of all that seek-bound I/O (or am I wrong about that bit?) 
which, for large sstables, will happen over some non-trivial amount of time.

Anyways... there are lots of details and they're difficult to express in a 
organized fashion. I think the main thing at least is to just be explicitly 
aware of the seek-bound I/O implied by compaction and probably confirm what the 
behavior actually is for a large sstable - specifically whether it will cause a 
storm of highly concurrent I/O with an insanely high queue depth (= *very* bad 
for live reads) or a slower mostly serial background read-in (= much better for 
live reads) or somewhere in between.

(I'll try to look into that, subject to time.)


> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-formatted.txt, 1902-per-column-migration-rebase2.txt, 
> 1902-per-column-migration.txt, CASSANDRA-1902-v3.patch, 
> CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to