[
https://issues.apache.org/jira/browse/CASSANDRA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499096#comment-16499096
]
Jon Haddad commented on CASSANDRA-14466:
----------------------------------------
Interesting idea. I've seen the issue you've described many times on read
heavy systems (high CPU due to page cache churn)
One thing I've noticed however, is that I've been able to get an order of
magnitude improvement on these systems by using a 4KB chunk lengh + disable
read ahead completely. The problem wasn't so much the page cache, but read
amplification. What I typically see is read heavy workloads tend to rely on
very small reads - a handful of rows. A 64kb chunk + a readahead of 256 = a
LOT of cache churn. I'm wondering how direct i/o would compare to a correctly
tuned system.
I see datastax recommends readahead of 8, I'm not sure what setting you used in
your benchmark. Since direct i/o avoids readahead it's probably better to do
an apples to apples comparison with it disabled completely. Adding readahead
on top of fetching data out of a single disk block for a small request wastes a
lot of i/o and you might be unfairly crippling baseline Cassandra by using it.
All of that said - with a read heavy workload randomly distributed over a
dataset as large as you've described, it's very unlikely you'd get a cache hit
so I can see how this might be a better approach overall. How it works for
other workloads or non-SSD is a different matter that we'd need to benchmark as
well.
> Enable Direct I/O
> ------------------
>
> Key: CASSANDRA-14466
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14466
> Project: Cassandra
> Issue Type: New Feature
> Components: Local Write-Read Paths
> Reporter: Mulugeta Mammo
> Priority: Major
> Attachments: direct_io.patch
>
>
> Hi,
> JDK 10 introduced a new API for Direct IO that enables applications to bypass
> the file system cache and potentially improve performance. Details of this
> feature can be found at [https://bugs.openjdk.java.net/browse/JDK-8164900].
> This patch uses the JDK 10 API to enable Direct IO for the Cassandra read
> path. By default, we have disabled this feature; but it can be enabled using
> a new configuration parameter, enable_direct_io_for_read_path. We have
> conducted a Cassandra read-only stress test and measured a throughput gain of
> up to 60% on flash drives.
> The patch requires JDK 10 Cassandra Support -
> https://issues.apache.org/jira/browse/CASSANDRA-9608
> Please review the patch and let us know your feedback.
> Thanks,
> [^direct_io.patch]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]