[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Jiangjie Qin (JIRA) Mon, 18 Apr 2016 22:19:39 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247198#comment-15247198
 ]


Jiangjie Qin commented on KAFKA-3565:
-------------------------------------

[~junrao] [~ijuma] I tried a few more things including 
1) tweaking compressor buffer size
2) using acks=0 which should eliminate the broker impact. 
3) Change the random integer range between 50 and 50000

I was still not able to reproduce the performance gap between 0.9 and trunk as 
Ismael observed.

To answer Ismael's questions:
1. I am not sure about the reason for the slowdown you saw. Is this steadily 
reproduceable? I am curious about where the actual bottleneck is. 
I ran a few tests with 3 producers, using ack=0 to ignore the broker. I 
compared the results between git hash eee95228fabe1643baa016a2d49fb0a9fe2c66bd 
and trunk. In most cases, trunk seems even performing better than 0.9.

2. It really depends, the existing users may or may not suffer a slowdown when 
they move to 0.10.0. If the bottleneck of the producing is on the broker, they 
will see improvement. In most cases, if they switch to an 0.10.0 producer, it 
seems they can have some performance improvement. But if the users have already 
max out the CPU they may see the lower throughput.

I agree with Jun that it is important to understand why the gap is there if it 
is stably reproduceable. It would be helpful if we can get the following 
metrics when running the tests:
1. actual batch size
2. actual request size
3. request latency
4. request rate
5. records queue time

To reduce the variable factors, I recommend we use 1 partition, acks=1, 
batch.size=500K, linger.ms=0 (replication factor should not matter here). 
The variable parameters are: max.in.flight.requests.per.connection=1 and 100, 
random integer range from 10 to 50000, compression.type=gzip and snappy.

I will run the tests tomorrow and let you know the results I saw. Please let me 
know if you think we should add more test parameters. Thanks.


> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ismael Juma
>            Priority: Critical
>             Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have 
> to recompress data (this was previously required after offsets were 
> assigned). The implicit assumption is that reducing CPU usage required by 
> recompression would mean that producer throughput for compressed data would 
> increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one 
> would expect given the additional size overhead caused by the timestamp 
> field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    
> 2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    
> 2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32

Reply via email to