[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013665#comment-13013665
 ] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

So as an example. Suppose you have live read traffic that results in 0.5 
average queue size. Assume for simplicity a single underlying disk that does 
exactly one request at a time and does no optimization for locality on disk. 
0.5 average under those circumstances means you're effectively utilizing 50% of 
the I/O capacity - the disk is spending 50% of it's wall clock time servicing 
your reads.

Suppose now that you add a background reader that does nothing but generate 
seek bound small reads, one after another. That, if left alone, would generate 
an average queue size of 1.

Running these at once should roughly even out. They should get about 0.5 each.

But suppose you run two background readers. In order for your live reads to 
"get" 50% of the time the disk spends doing reads, live reads must again have 
as many outstanding requests on average as the background readers. The 
background readers have 2 outstanding, so your live traffic must maintain two 
outstanding requests as well.

This means that at any given moment you need a queue depth of 4, two of which 
are live traffic. So when a new live request comes in, again absent 
prioritization/re-ordering, it must now wait for 4 requests to complete before 
it can be serviced. Thus, request time becomes 5x instead of 1x (x being the 
time needed to perform a read) because it needs to do the read itself, in 
addition to 4 other reads.

Now, in reality you have a queue depth that is preserved down onto a SAS/SATA 
bus, and you have disks that will optimize I/O (completely random seek bound 
I/O can get about twice as fast in terms of throughput (reqs/second) when queue 
depth starts to reach 32-64 requests - unless this has changed since a few 
years ago). With RAID you have multiple constituent drives which changes 
things. SSD:s complicate things too by being capable of actual real physical 
concurrent I/O.

So the above example would never reflect reality in terms of concrete numbers, 
but the "effect" is there - the details are just different.



> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 
> 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, 
> CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, 
> CASSANDRA-1902-v6.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to