[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915658#comment-13915658
 ] 

Benedict commented on CASSANDRA-6746:
-------------------------------------


2.0:
 INFO [main] 2014-02-27 18:54:12,724 CLibrary.java (line 63) JNA not found. 
Native methods will be disabled.
2.1:
INFO  [main] 2014-02-27 19:51:10,440 CLibrary.java:117 - JNA mlockall successful

The result being that we do not "skip IO cache" in 2.0. Assuming the OS 
actually listens to the DONTNEED command, this results in an empty page cache 
even if the OS could make room for it. This is OS dependent, as whilst testing 
on my own box, I found my OS would keep the pages cached anyway. On searching I 
found that this behaviour was modified in the linux kernel sometime around 
2010/11, as referenced 
[here|http://lists.samba.org/archive/rsync/2010-November/025827.html], although 
it's not clear which kernel it first made it into, clearly it is not in the 
build on this cluster and is on my laptop.

Note this is confirmed by fincore on the data files: in 2.0 all files remain 
100% cached at all times; in 2.1 they drop to 0% cached immediately after 
compaction completes.

I'm not sure what the correct response to this is. Largely this is simply 
behaving as expected, except that really issuing a DONTNEED when we probably DO 
need is not a great idea. The rationale of course is that if we're compacting 
stale data we don't want to pollute the page cache; but if we're compacting 
live data we will actively destroy the page cache when the OS listens 
stringently to the DONTNEED (which in this case it apparently does even though 
it has plenty of room to ignore us). Unless we can be smarter about issuing 
these commands, I think issuing them isn't actually such a great idea, at least 
not on kernel versions that elicit this behaviour. However I'm not convinced 
they make sense on newer kernel versions either, as live reads going to files 
that are about to be discarded could still leave us with an empty buffer cache, 
as the new pages are evicted in advance of becoming the live versions. In this 
scenario simply letting the kernel keep whatever pages it wants is probably 
best so there's no sudden performance cliff, although moving to incremental 
opening of the new sstables might be a better solution to this, so that reads 
transfer progressively, always to data that is already in buffer, keeping the 
transition smooth.


> Reads have a slow ramp up in speed
> ----------------------------------
>
>                 Key: CASSANDRA-6746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
>
>         Attachments: 2.1_vs_2.0_read.png
>
>
> On a physical four node cluister I am doing a big write and then a big read. 
> The read takes a long time to ramp up to respectable speeds.
> !2.1_vs_2.0_read.png!
> [See data 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.json&metric=interval_op_rate&operation=stress-read&smoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to