[ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102212#comment-14102212
 ] 

Jason Brown commented on CASSANDRA-6809:
----------------------------------------

So, I'll be honest, I'm not sure why this complicated solution is better than 
using a simple file-level compression (conceptually, something like adding 
another OutputStream decorator). I don't believe we need a slavish fixation on 
always staying within the pre-allocated size bound of the file - using less 
than what's declared in the file is a minor inefficiency, and using more 
(making the file larger) should just write out the extra data anyways (perhaps 
not contiguously), perhaps with a minor/trivial penalty (one that I'm not sure 
is measurable / affects the real-world behavior of cassandra). The file will 
get resized to the declared size in the yaml upon recycling, anyways, so I 
don't see a real disk consumption problem.

As I'm working on CASSANDRA-6018, the simpler solution is looking very sensible 
and is easy to reason about. Admittedly, I'm building that on top of 2.0 and 
haven't even begun thinking about merging into 2.1 (and the changes that have 
happened there). But, I think before we go down this of micro-segments and more 
threads/pools, can we reevaluate simpler techniques that do not render the 
codebase more complicated for what is arguably not that big of a win in the 
first place?

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log. 
> Doing so should improve throughput, but some care will need to be taken to 
> ensure we use as much of a segment as possible. I propose decoupling the 
> writing of the records from the segments. Basically write into a (queue of) 
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
> MB written to the CL (where X is ordinarily CLS size), and then pack as many 
> of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to