[ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013665#comment-13013665 ]
Peter Schuller commented on CASSANDRA-1902: ------------------------------------------- So as an example. Suppose you have live read traffic that results in 0.5 average queue size. Assume for simplicity a single underlying disk that does exactly one request at a time and does no optimization for locality on disk. 0.5 average under those circumstances means you're effectively utilizing 50% of the I/O capacity - the disk is spending 50% of it's wall clock time servicing your reads. Suppose now that you add a background reader that does nothing but generate seek bound small reads, one after another. That, if left alone, would generate an average queue size of 1. Running these at once should roughly even out. They should get about 0.5 each. But suppose you run two background readers. In order for your live reads to "get" 50% of the time the disk spends doing reads, live reads must again have as many outstanding requests on average as the background readers. The background readers have 2 outstanding, so your live traffic must maintain two outstanding requests as well. This means that at any given moment you need a queue depth of 4, two of which are live traffic. So when a new live request comes in, again absent prioritization/re-ordering, it must now wait for 4 requests to complete before it can be serviced. Thus, request time becomes 5x instead of 1x (x being the time needed to perform a read) because it needs to do the read itself, in addition to 4 other reads. Now, in reality you have a queue depth that is preserved down onto a SAS/SATA bus, and you have disks that will optimize I/O (completely random seek bound I/O can get about twice as fast in terms of throughput (reqs/second) when queue depth starts to reach 32-64 requests - unless this has changed since a few years ago). With RAID you have multiple constituent drives which changes things. SSD:s complicate things too by being capable of actual real physical concurrent I/O. So the above example would never reflect reality in terms of concrete numbers, but the "effect" is there - the details are just different. > Migrate cached pages during compaction > --------------------------------------- > > Key: CASSANDRA-1902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1902 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.1 > Reporter: T Jake Luciani > Assignee: Pavel Yaskevich > Fix For: 0.7.5, 0.8 > > Attachments: > 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, > 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, > 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, > CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, > CASSANDRA-1902-v6.patch > > Original Estimate: 32h > Time Spent: 56h > Remaining Estimate: 0h > > Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a > pre-compacted CF during the compaction process. This is now important since > CASSANDRA-1470 caches effectively nothing. > For example an active CF being compacted hurts reads since nothing is cached > in the new SSTable. > The purpose of this ticket then is to make sure SOME data is cached from > active CFs. This can be done my monitoring which Old SSTables are in the page > cache and caching active rows in the New SStable. > A simpler yet similar approach is described here: > http://insights.oetiker.ch/linux/fadvise/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira