[
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102212#comment-14102212
]
Jason Brown commented on CASSANDRA-6809:
----------------------------------------
So, I'll be honest, I'm not sure why this complicated solution is better than
using a simple file-level compression (conceptually, something like adding
another OutputStream decorator). I don't believe we need a slavish fixation on
always staying within the pre-allocated size bound of the file - using less
than what's declared in the file is a minor inefficiency, and using more
(making the file larger) should just write out the extra data anyways (perhaps
not contiguously), perhaps with a minor/trivial penalty (one that I'm not sure
is measurable / affects the real-world behavior of cassandra). The file will
get resized to the declared size in the yaml upon recycling, anyways, so I
don't see a real disk consumption problem.
As I'm working on CASSANDRA-6018, the simpler solution is looking very sensible
and is easy to reason about. Admittedly, I'm building that on top of 2.0 and
haven't even begun thinking about merging into 2.1 (and the changes that have
happened there). But, I think before we go down this of micro-segments and more
threads/pools, can we reevaluate simpler techniques that do not render the
codebase more complicated for what is arguably not that big of a win in the
first place?
> Compressed Commit Log
> ---------------------
>
> Key: CASSANDRA-6809
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Benedict
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: performance
> Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log.
> Doing so should improve throughput, but some care will need to be taken to
> ensure we use as much of a segment as possible. I propose decoupling the
> writing of the records from the segments. Basically write into a (queue of)
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X
> MB written to the CL (where X is ordinarily CLS size), and then pack as many
> of the compressed chunks into a CLS as possible.
--
This message was sent by Atlassian JIRA
(v6.2#6252)