[
https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908553#comment-14908553
]
Jordan Shaw commented on KAFKA-2189:
------------------------------------
Hi all,
I was wondering if this affects only 0.8.2.1 or also 0.8.2? We are on 0.8.2 and
just did a complete rebalance across our brokers and some brokers are at 70%
disk utilization and some are at 30%. Thanks.
> Snappy compression of message batches less efficient in 0.8.2.1
> ---------------------------------------------------------------
>
> Key: KAFKA-2189
> URL: https://issues.apache.org/jira/browse/KAFKA-2189
> Project: Kafka
> Issue Type: Bug
> Components: build, compression, log
> Affects Versions: 0.8.2.1
> Reporter: Olson,Andrew
> Assignee: Ismael Juma
> Priority: Blocker
> Labels: trivial
> Fix For: 0.9.0.0, 0.8.2.2
>
> Attachments: KAFKA-2189.patch
>
>
> We are using snappy compression and noticed a fairly substantial increase
> (about 2.25x) in log filesystem space consumption after upgrading a Kafka
> cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages
> being seemingly recompressed individually (or possibly with a much smaller
> buffer or dictionary?) instead of as a batch as sent by producers. We
> eventually tracked down the change in compression ratio/scope to this [1]
> commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka
> client version does not appear to be relevant as we can reproduce this with
> both the 0.8.1.1 and 0.8.2.1 Producer.
> Here are the log files from our troubleshooting that contain the same set of
> 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last
> commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue.
> {noformat}
> -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45
> /var/kafka2/f9d9b-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45
> /var/kafka2/f9d9b-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 89645 May 12 11:45
> /var/kafka2/f9d9b-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 88279 May 12 11:45
> /var/kafka2/f9d9b-batch-1000-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41
> /var/kafka2/f5ab8-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41
> /var/kafka2/f5ab8-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41
> /var/kafka2/f5ab8-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41
> /var/kafka2/f5ab8-batch-1000-0/00000000000000000000.log
> {noformat}
> [1]
> https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)