[ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012867#comment-13012867 ]
Peter Schuller commented on CASSANDRA-1902: ------------------------------------------- bq. Documentation for posix_fadvice/madvice calls suggests to do more frequent little requests instead of big requests - kernel in the high possibility going to ignore an advice on the big region. I suspect this comes from WILLNEED being intended as a read-ahead hint. The implementation puts a maximum cap on the amount of read-ahead it'll do: http://lxr.free-electrons.com/source/mm/fadvise.c#L95 http://lxr.free-electrons.com/source/mm/readahead.c#L203 http://lxr.free-electrons.com/source/mm/readahead.c#L236 My knowledge of the VM subsystem fails me but my interpretation of max_sane_readahead() is that it's trying to essentially cap based on available memory resources such that it may or may not do the read-ahead (or do it to a lesser extent) based on memory pressure. This is different from e.g. DONTNEED which is fine to call for huge segments and seems to be O(n) (as per CASSANDRA-1470 discussion). Btw, it's also possible to populate with an mmap() call with MAP_POPULATE (Linux-specific, but we're totally Linux specific anyway). I haven't checked but empirically I suspect this is not subject to limits on how much you populate - and the call is also blocking. bq. Note that WILLNEED is non-blocking call. Implementation: http://lxr.free-electrons.com/source/mm/readahead.c#L145 which boils down to a call to: http://lxr.free-electrons.com/source/mm/readahead.c#L109 So yes it's non-blocking, but you're still either going to take the random I/O (remember that as of CASSANDRA-1470 we know nothing will be in page cache) or else requests would be dropped somewhere (causing less hotness). If you're taking the I/O, again either the I/O is done in such a way as to essentially be serial or else it's done with some level of parallelism. Which one will affect the impact is has on other I/O. bq. We can't stop using DONTNEED while writing compacted file because it will suck pages from sstables which are currently in use. And we do WILLNEED's only when we have SSTableReader for a compacted file ready - right before old sstables going to be replaced with new ones so this is not going to make a big performance impact on the reads. The whole process is going to be suboptimal yes since both the old and the new sstable will be cached. But note that even though the read-ahead is done at the point of the switch, the actual freeing of cached pages associated with the old sstable is still happening at some point in the future as a result of GC (right? or am I misreading?). My thinking has been that something along the lines of CASSANDRA-1608 and capping sstable sizes may helt mitigate these effects at some point in the future. Meanwhile, the only way I see to avoid double caching is to DONTNEED the old table first, but that has the obvious effects on live traffic. There is definitely the trade-off here and the timing is different in the WILLNEED-just-before-switch case, maybe leading to more data being retained hot, but at the cost of all that seek-bound I/O (or am I wrong about that bit?) which, for large sstables, will happen over some non-trivial amount of time. Anyways... there are lots of details and they're difficult to express in a organized fashion. I think the main thing at least is to just be explicitly aware of the seek-bound I/O implied by compaction and probably confirm what the behavior actually is for a large sstable - specifically whether it will cause a storm of highly concurrent I/O with an insanely high queue depth (= *very* bad for live reads) or a slower mostly serial background read-in (= much better for live reads) or somewhere in between. (I'll try to look into that, subject to time.) > Migrate cached pages during compaction > --------------------------------------- > > Key: CASSANDRA-1902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1902 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.1 > Reporter: T Jake Luciani > Assignee: T Jake Luciani > Fix For: 0.7.5, 0.8 > > Attachments: > 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, > 1902-formatted.txt, 1902-per-column-migration-rebase2.txt, > 1902-per-column-migration.txt, CASSANDRA-1902-v3.patch, > CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch > > Original Estimate: 32h > Time Spent: 56h > Remaining Estimate: 0h > > Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a > pre-compacted CF during the compaction process. This is now important since > CASSANDRA-1470 caches effectively nothing. > For example an active CF being compacted hurts reads since nothing is cached > in the new SSTable. > The purpose of this ticket then is to make sure SOME data is cached from > active CFs. This can be done my monitoring which Old SSTables are in the page > cache and caching active rows in the New SStable. > A simpler yet similar approach is described here: > http://insights.oetiker.ch/linux/fadvise/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira