[
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970102#action_12970102
]
Peter Schuller commented on CASSANDRA-1470:
-------------------------------------------
Just to clarify then; as jbellis surmised my comments where indeed based on the
fact that writes will be synchronous. In particular, what write caching gives
you normally is the ability to defer the actual writing such that:
(1) future writes can be colesced with past writes which in the extreme case
translates seek-bound I/O to huge slabs of sequential I/O
(2) re-written pages aren't re-written on disk
(3) it allows the program to continue (e.g. churning CPU) without interrupting
to wait for disk I/O
(4) It de-couples the size of individual writes the application happens to make
from the way it gets written out to disk
Using direct I/O in the general case is difficult because there is a lot of
logic in the kernel to implement this in a way that works generally. But with
cassandra, we:
(1) are not concerned with re-writing pages
(2) are not concerned with mixing seek-bound and streaming I/O
(3) are specifically after writing large amounts of data and we can select when
to flush in-memory buffers
So the problem becomes easier. But still, each direct write will essentially
behave like a write() followed by an fsync(), with the performance implications
that has (though not necessarily exactly; e.g. an asynchronous write() followed
by fsync() might sit in an i/o queue waiting if the fsync() doesn't highten the
priority of the previous write etc - depending on exact kernel behavior and
whatnot).
As far as I know, given large chunks being written we really should be able to
achieve similar throughputs as the background writing done by the kernel. With
one major caveat: If the writing is single-threaded, the lack of an
asynchronous syscall API means that the thread will not be able to keep busy
with CPU bound activity while waiting for the actual write. So while the
writing when it does happen really should have the potential to be efficient,
if one does want to simultaneously be CPU bound in e.g. compaction, the writing
would have to happen from a background thread.
However, note that the CPU waiting is not necessarily as bad is it sounds. If
your compaction is heavily CPU bound the effect will be small in relative terms
because very little time is spent doing the I/O anyway. If the compaction is
heavily disk bound, you don't really care anyway since any additional time
spent spinning CPU is just going to *lessen* negative impacts of compaction
because it decreases the effect on live traffic.
The most significant effect should be seen when compaction is reasonably
balanced between CPU and disk, and in the extreme case one should potentially
see up to a halving of compaction speed in a situation without live traffic
further delaying I/O.
I hope I'm being clear :) (And definitely do correct me if I'm overlooking
something.) I feel a bit bad commenting all the time without actually putting
up any code...
> use direct io for compaction
> ----------------------------
>
> Key: CASSANDRA-1470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Pavel Yaskevich
> Fix For: 0.7.1
>
> Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch,
> CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch,
> CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch,
> CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch,
> CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch,
> CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch,
> CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch,
> CASSANDRA-1470.patch,
> use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch
>
>
> When compaction scans through a group of sstables, it forces the data in the
> os buffer cache being used for hot reads, which can have a dramatic negative
> effect on performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.