[ https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246460#comment-15246460 ]
Jun Rao commented on KAFKA-3565: -------------------------------- [~becket_qin], in the benchmark result that Ismael posted, each message is 1000-byte. So, it's surprising to see a performance degradation of 17% just because of the addition of an 8-byte per message overhead. The result seems different from the test that you have. Between the two tests, it seems they differ on how random the data is, the number of partitions and the message size. Could you do some more tests to see if (1) if you can reproduce the degradation from Ismael's test locally (2) what specific configurations are causing the degradation (e.g., randomness of data , # partitions)? The degradation may only happen in corner cases, but we need to understand this. Would you have time to help figure this out before releasing 0.10.0? Thanks, > Producer's throughput lower with compressed data after KIP-31/32 > ---------------------------------------------------------------- > > Key: KAFKA-3565 > URL: https://issues.apache.org/jira/browse/KAFKA-3565 > Project: Kafka > Issue Type: Bug > Reporter: Ismael Juma > Priority: Critical > Fix For: 0.10.0.0 > > > Relative offsets were introduced by KIP-31 so that the broker does not have > to recompress data (this was previously required after offsets were > assigned). The implicit assumption is that reducing CPU usage required by > recompression would mean that producer throughput for compressed data would > increase. > However, this doesn't seem to be the case: > {code} > Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32) > test_id: > 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy > status: PASS > run time: 59.030 seconds > {"records_per_sec": 519418.343653, "mb_per_sec": 49.54} > {code} > Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292 > {code} > Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32) > test_id: > 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy > status: PASS > run time: 1 minute 0.243 seconds > {"records_per_sec": 427308.818848, "mb_per_sec": 40.75} > {code} > Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d > The difference for the uncompressed case is smaller (and within what one > would expect given the additional size overhead caused by the timestamp > field): > {code} > Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32) > test_id: > 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100 > status: PASS > run time: 1 minute 4.176 seconds > {"records_per_sec": 321018.17747, "mb_per_sec": 30.61} > {code} > Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f > {code} > Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32) > test_id: > 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100 > status: PASS > run time: 1 minute 5.079 seconds > {"records_per_sec": 291777.608696, "mb_per_sec": 27.83} > {code} > Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed -- This message was sent by Atlassian JIRA (v6.3.4#6332)