[jira] [Commented] (CASSANDRA-8895) Compressed sstables should only compress if the win is above a certain threshold, and should use a variable block size

Benedict (JIRA) Thu, 16 Jul 2015 08:18:00 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629872#comment-14629872
 ]


Benedict commented on CASSANDRA-8895:
-------------------------------------

The basic idea of this is to compress using the smallest buffer size we can, so 
that we waste no IOPs (or cycles) answering reads. This is especially important 
for small partitions.

To give a rough outline of how I think this should work: the idea would be to 
buffer up to, say, 1Mb for any sstable we write (instead of the current 64Kb), 
at least for the first Mb, or perhaps for the first 10Mb (iteratively), or 
perhaps once every 50Mb, for 1Mb. These details aren't very important and can 
be tweaked later.

We introduce a configuration parameter that specifies how much compression 
needs to achieve to be worth pursuing, and a ratio of improvement to warrant a 
larger block size (let's say, we require that we get 15% better compression to 
warrant twice the block size)

Once we have this and our larger-than-normal buffer, we essentially perform 
binary search to find out optimal chunk size. 
* We start with _no_ compression, and 64Kb chunk size with compression. 
* If 64Kb is not above our minimum compression ratio, we use no compression. 
* If it is, we try the mid-point _of the logarithmic scale_ (i.e. if 4Kb is our 
minimum chunk size, we have possible sizes of 4, 8, 16, 32, 64, so our midpoint 
would be 8 or 16)
** If that is a better choice based on our parameters, we then try the 
mid-point of that and off (i.e. 4 or 8), etc
** If not, we try the midpoint above (i.e. 16 or 32), etc
* We then use this chunk size for the contents of the buffer _and all 
proceeding writes_

We should most likely short-circuit this if on compaction we estimate a single 
partition to be larger than 64Kb, or we should set our lower-bound of chunk 
size to whatever size our partitions are estimated to be.

There are a lot of implementation decisions to be made, such as if the 
no-compression route pretends to be a compressed file and the compressed reader 
just understands that, or if we make a decision earlier (both have their 
unpleasant aspects). If we just make a single decision up-front, or continually 
reassess the decision.

> Compressed sstables should only compress if the win is above a certain 
> threshold, and should use a variable block size
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8895
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8895
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Paulo Motta
>              Labels: performance
>             Fix For: 3.x
>
>
> On performing a flush to disk, we should assess if the data we're flushing 
> will actually be substantively compressed, and how large the page should be 
> to get optimal compression ratio versus read latency. Decompressing 64Kb 
> chunks is wasteful when reading small records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8895) Compressed sstables should only compress if the win is above a certain threshold, and should use a variable block size

Reply via email to