[
https://issues.apache.org/jira/browse/CASSANDRA-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629872#comment-14629872
]
Benedict commented on CASSANDRA-8895:
-------------------------------------
The basic idea of this is to compress using the smallest buffer size we can, so
that we waste no IOPs (or cycles) answering reads. This is especially important
for small partitions.
To give a rough outline of how I think this should work: the idea would be to
buffer up to, say, 1Mb for any sstable we write (instead of the current 64Kb),
at least for the first Mb, or perhaps for the first 10Mb (iteratively), or
perhaps once every 50Mb, for 1Mb. These details aren't very important and can
be tweaked later.
We introduce a configuration parameter that specifies how much compression
needs to achieve to be worth pursuing, and a ratio of improvement to warrant a
larger block size (let's say, we require that we get 15% better compression to
warrant twice the block size)
Once we have this and our larger-than-normal buffer, we essentially perform
binary search to find out optimal chunk size.
* We start with _no_ compression, and 64Kb chunk size with compression.
* If 64Kb is not above our minimum compression ratio, we use no compression.
* If it is, we try the mid-point _of the logarithmic scale_ (i.e. if 4Kb is our
minimum chunk size, we have possible sizes of 4, 8, 16, 32, 64, so our midpoint
would be 8 or 16)
** If that is a better choice based on our parameters, we then try the
mid-point of that and off (i.e. 4 or 8), etc
** If not, we try the midpoint above (i.e. 16 or 32), etc
* We then use this chunk size for the contents of the buffer _and all
proceeding writes_
We should most likely short-circuit this if on compaction we estimate a single
partition to be larger than 64Kb, or we should set our lower-bound of chunk
size to whatever size our partitions are estimated to be.
There are a lot of implementation decisions to be made, such as if the
no-compression route pretends to be a compressed file and the compressed reader
just understands that, or if we make a decision earlier (both have their
unpleasant aspects). If we just make a single decision up-front, or continually
reassess the decision.
> Compressed sstables should only compress if the win is above a certain
> threshold, and should use a variable block size
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8895
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8895
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Paulo Motta
> Labels: performance
> Fix For: 3.x
>
>
> On performing a flush to disk, we should assess if the data we're flushing
> will actually be substantively compressed, and how large the page should be
> to get optimal compression ratio versus read latency. Decompressing 64Kb
> chunks is wasteful when reading small records.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)