[
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861316#comment-15861316
]
Robert Stupp commented on CASSANDRA-10520:
------------------------------------------
Generally, the whole patch LGTM. Great work!
Some notes:
* Even if CRC checks are disables, we always call
{{ThreadLocalRandom.current().nextDouble()}}, which seems unnecessary. I've
played around with that in [this
commit|https://github.com/snazy/cassandra/commit/e30dcdef64ccdcff75043d50a50ecb95c26c9667].
Feel free to include it in your branch, if you think it's ok and fine to sneak
this into this ticket.
* When writing compressed chunks, the {{compressed}} buffer is sized to the max
compression length. WDYT about just passing a buffer that's bounded to
{{maxCompressedLength}} and handle the buffer-overflow-exception to write it
uncompressed? My (unproven) hope is that compression can stop earlier - i.e.
would not need to compress to more than {{maxCompressedLength}}.
* Just for clarification - is the following correct?
** on write, if {{compressedDataLength > maxCompressedLength}}, data is written
uncompressed, compressed otherwise
** on read, if {{chunkLength <= maxCompressedLength}} data is read compressed,
uncompressed otherwise.
* Nice minor catches btw - like the power-of-2 check and removal of boxed types
The micro benchmark looks different on my Linux machine (also [added
{{disk_access_mode=standard}}|https://github.com/snazy/cassandra/commit/87c528fa57c18900f5ad075fb564559ba9368944]
for completeness). There are probably a lot of reasons why our JMH runs
differ. Anyway, although my run shows not that big difference, it is still
worth to do this as probably no-one wants to let data unnecessarily become
bigger because of compression.
{code}
[java] Benchmark (accessMode)
(compression) Mode Cnt
Score Error Units
[java] ReadWriteTestCompression.readFixed mmap
{'enabled':false} avgt 15
5.549 ± 0.314 us/op
[java] ReadWriteTestCompression.readFixed mmap
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
5.732 ± 0.126 us/op
[java] ReadWriteTestCompression.readFixed mmap {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.241 ± 0.108 us/op
[java] ReadWriteTestCompression.readFixed mmap
{'class': 'LZ4Compressor'} avgt 15
5.840 ± 0.388 us/op
[java] ReadWriteTestCompression.readFixed mmap
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
5.594 ± 0.085 us/op
[java] ReadWriteTestCompression.readFixed mmap_index_only
{'enabled':false} avgt 15
5.542 ± 0.128 us/op
[java] ReadWriteTestCompression.readFixed mmap_index_only
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
5.580 ± 0.027 us/op
[java] ReadWriteTestCompression.readFixed mmap_index_only {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.431 ± 0.144 us/op
[java] ReadWriteTestCompression.readFixed mmap_index_only
{'class': 'LZ4Compressor'} avgt 15
5.132 ± 0.089 us/op
[java] ReadWriteTestCompression.readFixed mmap_index_only
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
5.531 ± 0.104 us/op
[java] ReadWriteTestCompression.readFixed standard
{'enabled':false} avgt 15
5.490 ± 0.016 us/op
[java] ReadWriteTestCompression.readFixed standard
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
5.479 ± 0.088 us/op
[java] ReadWriteTestCompression.readFixed standard {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.480 ± 0.152 us/op
[java] ReadWriteTestCompression.readFixed standard
{'class': 'LZ4Compressor'} avgt 15
5.328 ± 0.346 us/op
[java] ReadWriteTestCompression.readFixed standard
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
5.552 ± 0.123 us/op
[java] ReadWriteTestCompression.readRandom mmap
{'enabled':false} avgt 15
6.142 ± 0.072 us/op
[java] ReadWriteTestCompression.readRandom mmap
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
6.121 ± 0.060 us/op
[java] ReadWriteTestCompression.readRandom mmap {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.721 ± 0.095 us/op
[java] ReadWriteTestCompression.readRandom mmap
{'class': 'LZ4Compressor'} avgt 15
5.872 ± 0.089 us/op
[java] ReadWriteTestCompression.readRandom mmap
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
6.383 ± 0.118 us/op
[java] ReadWriteTestCompression.readRandom mmap_index_only
{'enabled':false} avgt 15
5.886 ± 0.117 us/op
[java] ReadWriteTestCompression.readRandom mmap_index_only
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
6.328 ± 0.111 us/op
[java] ReadWriteTestCompression.readRandom mmap_index_only {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.779 ± 0.178 us/op
[java] ReadWriteTestCompression.readRandom mmap_index_only
{'class': 'LZ4Compressor'} avgt 15
5.945 ± 0.119 us/op
[java] ReadWriteTestCompression.readRandom mmap_index_only
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
5.779 ± 0.183 us/op
[java] ReadWriteTestCompression.readRandom standard
{'enabled':false} avgt 15
6.257 ± 0.049 us/op
[java] ReadWriteTestCompression.readRandom standard
{'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15
5.755 ± 0.064 us/op
[java] ReadWriteTestCompression.readRandom standard {'class':
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15
5.864 ± 0.144 us/op
[java] ReadWriteTestCompression.readRandom standard
{'class': 'LZ4Compressor'} avgt 15
6.323 ± 0.087 us/op
[java] ReadWriteTestCompression.readRandom standard
{'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15
5.858 ± 0.028 us/op
{code}
> Compressed writer and reader should support non-compressed data.
> ----------------------------------------------------------------
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables
> during stress-tests, results in chunks larger than 64k which are a problem
> for the buffer pooling mechanisms employed by the
> {{CompressedRandomAccessReader}}. This results in non-negligible performance
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it
> does not provide benefits, I think we should allow compressed files to store
> uncompressed chunks as alternative to compressed data. Such a chunk could be
> written after compression returns a buffer larger than, for example, 90% of
> the input, and would not result in additional delays in writing. On reads it
> could be recognized by size (using a single global threshold constant in the
> compression metadata) and data could be directly transferred into the
> decompressed buffer, skipping the decompression step and ensuring a 64k
> buffer for compressed data always suffices.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)