[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009863#comment-16009863
 ] 

Ismael Juma commented on KAFKA-5236:
------------------------------------

Thanks for the report. The root cause is that the block size for Snappy was 
changed from 32 KB to 1 KB in the broker:

https://github.com/apache/kafka/pull/2140#discussion_r90383989

This is the same block size used by the producer and with the 0.10.x format, 
the broker won't recompress the messages in the common case.

KAFKA-5148 and KAFKA-3704 are related. We should probably use the default block 
size by default (32 KB) in both broker and producer and allow the block size to 
be configurable as per those JIRAs.

> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-5236
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5236
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.2.1
>            Reporter: Nick Travers
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer this issue:
> https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818
> cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to