[
https://issues.apache.org/jira/browse/KAFKA-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian Kosmowski updated KAFKA-9716:
---------------------------------------
Priority: Minor (was: Major)
> Values of compression-rate and compression-rate-avg are misleading
> ------------------------------------------------------------------
>
> Key: KAFKA-9716
> URL: https://issues.apache.org/jira/browse/KAFKA-9716
> Project: Kafka
> Issue Type: Bug
> Components: clients, compression
> Affects Versions: 2.4.1
> Reporter: Christian Kosmowski
> Priority: Minor
>
> The values of the following metrics:
> compression-rate and compression-rate-avg and basically every other
> compression-rate (i.e.) topic compression rate
> are confusing.
> They are calculated as follows:
> {code:java}
> if (numRecords == 0L) {
> buffer().position(initialPosition);
> builtRecords = MemoryRecords.EMPTY;
> } else {
> if (magic > RecordBatch.MAGIC_VALUE_V1)
> this.actualCompressionRatio = (float) writeDefaultBatchHeader() /
> this.uncompressedRecordsSizeInBytes;
> else if (compressionType != CompressionType.NONE)
> this.actualCompressionRatio = (float)
> writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes;
> ByteBuffer buffer = buffer().duplicate();
> buffer.flip();
> buffer.position(initialPosition);
> builtRecords = MemoryRecords.readableRecords(buffer.slice());
> }
> {code}
> basically the compressed size is divided by the uncompressed size which leads
> to a value < 1 for high compression (good if you want compression) or > 1 for
> poor compression (bad if you want compression).
> From the name "compression rate" i would expect the exact opposite. Apart
> from the fact that the word "rate" usually refers to comparisons based on
> values of different units (miles per hour) the correct word "ratio" would
> refer to the uncompressed size divided by the compressed size.
> So if the compressed data takes half the space of the uncompressed data the
> correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka
> reports it. That is really confusing and i would AT LEAST expect that this
> behaviour would be documented somewhere, but it's not all documentation
> sources just say "the compression rate".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)