[jira] [Commented] (CASSANDRA-1902) Migrate cached pages during compaction

Peter Schuller (JIRA) Wed, 30 Mar 2011 14:19:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013659#comment-13013659
 ]


Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

Regarding drop caches: Right, I don't remember whether the echo will block 
until eviction is complete or not (in cases where it is slow, it should be CPU 
bound though and not imply I/O). But I made sure that: (1) the echo terminated, 
(2) I got iostat running, (3) waited for flurry of I/O to complete that is 
generated by background operations on any modern machine, that you suddenly see 
when you do a *complete* buffer cache drop, and (4) saw it idle, and (5) only 
then Cassandra began the pre-population.

So hopefully that part of the test should be kosher.

Regarding mere mortals ;) Sorry. Is it the iostat stuff which is unclear? I'm 
looking at (in a monospaced font btw, for alignment...) the avgqu-sz column 
which indicates the average number of outstanding I/O requests for the sampling 
duration (1 second in this case). This is effectively the "queue depth".

There are usually two main interesting things about "iostat -k -x 1" (-x being 
key). One is utilization, which shows the percentage of time there was *any* 
outstanding I/O request to the device. (But one has to interpret it in context; 
for example a RAID0 of 10 disks can be at "100%" utilization yet be only 10% 
saturated.) The other is the average queue size, which is a more direct 
indication of how any concurrent requests are being serviced.

In the case of the 10 disk RAID0, 100% utilization with an average queue size 
of 1 would mean roughly 10% saturation of underlying disks. 100% utilization 
with an average queue size of 5 would mean roughly 50% saturation of underlying 
disks.

The other relevance of the average queue size is on latency. Disregarding 
potential relative prioritization going on, if the average number of 
outstanding requests is, say, 10 - any single request will typically have to 
wait for 10 other requests to be servied first. (But again that has to be 
interpreted in context; if you have 10 constitutent disks in a RAID0, that 10 
is effectively 1 for latency purposes)

So, when judging the expected effects on the latency (and throughput) of "live 
reads", it's interesting to look at these values.

In particular, consider the simple case of a single seek-bound serial reader. 
If the average queue depth is 5, this single reader would probably see a 
throughput roughly 1/5 of normal (I"m assuming otherwise identical I/O in terms 
of size of requests). A practical example is something like a "tar -czvf" that 
is reading a lot of small files (fs meta data etc).

So in that sense, a constant pressure of 5 outstanding requests will cause a 
very significant slow-down to the serial reader.

On the other hand, if instead of having a serial read you have N number of 
concurrent readers - you would now rather expect a throughput more like N/(N + 
5 - 1) of normal. As the concurrency of the interesting I/O increases, the 5 
extra makes less of a difference.

You tend to reach an interesting equilibrium here. Suppose you serve some 
amount of requests per second normally, and that it gives rise to an average 
queue depth of 0.5. Now add the constant background pressure of 5 requests. 
Assuming the reads (that normally gave rise to the 0.5 queue depth) are 
*independent* (I.e., added latency to one does not prevent the next one from 
coming in), what tends to happen is that you start accumulating outstanding 
requests until the number of concurrent requests is high enough that you reach 
the throughput you had before. Only instead of 0.5 average concurrency, you 
have something higher than 5. Whatever is required to "drown out" the extra 5 
enough.

Even if you are able to reach your desired throughput (requests per second) 
like this, it significantly adds to the average latency of each read. Not only 
will each read have to contend with the extra 5 background I/O operations 
always pending, they also have to compete with other concurrent "live" requests.

> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 
> 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, 
> CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, 
> CASSANDRA-1902-v6.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1902) Migrate cached pages during compaction

Reply via email to