[ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285476#comment-14285476
 ] 

Benedict commented on CASSANDRA-6809:
-------------------------------------

bq. assuming that the sync period is sane (e.g. ~100ms)

The sync period is, by default, 10s, and to my knowledge this is what many 
users run with - so in general we will only compress each individual segment. 
This is still sane, since the cluster has redundancy, although a sync period of 
between 100ms and 500ms might be more suitable for high traffic nodes. Still, 
it's probably not a big deal since we only care about compression when under 
saturation, which should mean many segments. I only mention it, since it is an 
easy extension. This extension also means the sync thread may have compressed 
data _waiting_ for it when it runs, reducing the latency until sync completion.

bq. Let me try to rephrase what you are saying to make sure I understand it 
correctly:

Almost:

* single sync thread forms sections at regular time intervals and sends them to 
compression executor/phase (SPMC queue),
* _sync thread waits on futures and syncs each in order_

Or, with the extension:

* mutators periodically submit segment to compressor
* once compressor completes an entire segment, requestExtraSync() is called 
(instead of in advanceAllocatingFrom())

bq. Why is this simpler, or of comparable complexity?

We have two steps in explanation, instead of five. More importantly, there is 
no interleaving of events to reason about between the sync threads, and the 
"lastSync" is accurate (which is important since this could artificially pause 
writes). This also means future improvements here are easier and safer to 
deliver, because we don't have to reason about how they interplay with each 
other. In particular, marking lastSync roll over after each segment is synced 
is a natural improvement (to ensure write latencies don't spike under load) but 
is challenging to introduce with multiple sync threads. Since we don't expect 
this feature to be used widely (we expect multiple CL disks to be used instead, 
if you're bottlenecking) the simpler approach seems more sensible to me.

bq. Wouldn't the two extra queues waste resources and increase latency?

We have zero in the typical case, and one extra queue in the uncommon use case. 
If we introduce enough threads that compression is faster than disk, then there 
will be near zero synchronization costs; of course, if that is not the case, 
and we are bottlenecking on compression still, then we aren't really losing 
much (a few micros. every few hundred millis, at 250MB/s compression speed), so 
it doesn't seem likely to be significant. 

We're now no longer honouring the sync interval; we are syncing more 
frequently, which may reduce disk throughput. The exact time of syncing in 
relation to each other may also vary, likely into lock-step under saturation, 
so that there may be short periods of many competing syncs potentially yielding 
pathological disk behaviour, and introducing competition for the synchronized 
blocks inside the segments, in effect introducing a MPMC queue, eliminating 
those few micros of benefit. 

(FTR, the MPMC, SPMC, MPSC aspects are likely not important here. The only 
concern is thread signalling, but this is the wrong order of magnitude to 
matter when bottlenecking on disk or compression of large chunks)

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: ComitLogStress.java, logtest.txt
>
>
> It seems an unnecessary oversight that we don't compress the commit log. 
> Doing so should improve throughput, but some care will need to be taken to 
> ensure we use as much of a segment as possible. I propose decoupling the 
> writing of the records from the segments. Basically write into a (queue of) 
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
> MB written to the CL (where X is ordinarily CLS size), and then pack as many 
> of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to