[ 
https://issues.apache.org/jira/browse/CASSANDRA-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216780#comment-14216780
 ] 

Ariel Weisberg commented on CASSANDRA-7404:
-------------------------------------------

I ended up keeping things simple. To avoid issues with large numbers of files I 
have RandomAccessReader fall back to doing buffered IO if > 64 files are opened 
using O_DIRECT so the kernel can the page cache management.

Initial test result doesn't seem to show any impact on performance.
http://cstar.datastax.com/graph?stats=6a743f8e-6eae-11e4-8da2-bc764e04482c&metric=op_rate&operation=write&smoothing=1&show_aggregates=true

Another larger mixed read/write test is in progress, and it's going to take a 
while possibly into Thursday morning.
http://cstar.datastax.com/tests/id/f2f6fa72-6f53-11e4-8b91-bc764e04482c

Code is reviewable at
https://github.com/aweisberg/cassandra/compare/CASSANDRA-7404
https://github.com/aweisberg/cassandra/compare/CASSANDRA-7404.patch

This commit 
https://github.com/aweisberg/cassandra/commit/1b2ca858818eb37debc5bb154d6628bb514ba3fd
 deserves special attention as it is where I chose which code is going to try 
and use O_DIRECT and which isn't. Anything that looked like sequential access 
that is going to consume the entire file uses O_DIRECT.

I am kind of at a loss how to deal with the various flags that go into the 
various kinds of scanners/readers/whatever. The call graph going up from 
RandomAcessReader's constructor is large and amplified by the number of 
wrappers used to provide default arguments or drop arguments.

I would fix it by using a builder except there are several layers of 
indirection where these decisions are made so I think it would be more than one 
builder. One for each abstraction. Maybe we just need one builder to make a 
configuration description for preferences like rate limiting and direct io.

> Use direct i/o for sequential operations (compaction/streaming)
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-7404
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7404
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
>
>
> Investigate using linux's direct i/o for operations where we read 
> sequentially through a file (repair and bootstrap streaming, compaction 
> reads, and so on). Direct i/o does not go through the kernel page page, so it 
> should leave the hot cache pages used for live reads unaffected.
> Note: by using direct i/o, we will probably take a performance hit on reading 
> the file we're sequentially scanning through (that is, compactions may get 
> slower), but the goal of this ticket is to limit the impact of these 
> background tasks on the main read/write functionality. Of course, I'll 
> measure any perf hit that is incurred, and see if there's any mechanisms to 
> mitigate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to