[ 
https://issues.apache.org/jira/browse/KAFKA-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748575#comment-15748575
 ] 

Juan Chorro edited comment on KAFKA-4474 at 12/14/16 3:17 PM:
--------------------------------------------------------------

Hi again!

We have been doing more performance tests and we have observed an anomaly 
behavior. In link below you can see all gotten information:

https://docs.google.com/spreadsheets/d/1Ywj2nlKoBdu6fX_fkPeY7ebBqXtEo79x9uyIhyZVB7A/edit?usp=sharing

In all cases each one service has its own node. We have gotten the consumer and 
producer throughtput by JMX protocol with jconsole tool.

We have a synthetic producer that injects ~100K messages per second to 
kafka-streams application and we can seei next cases:

* Case A: We have 1 zookeeper server, 1 kafka-broker and 1 kafka-streams app. 
Also we have two topics, input and output, with 4 partitions each one. We have 
a synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~20K messages per second and in output topic the 
app is producing to ~4K messages per second. Where is others ~16K messages per 
second whether I don't observe a excesive RAM increment?

* Case B: We have 1 zookeeper server, 2 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 2 partitions each one. We have a 
synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~100K messages per second and in output topic 
the app is producing to ~100K messages per second. This case is correct!

* Case C: We have 1 zookeeper server, 2 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 4 partitions each one. We have a 
synthetic producer that injects ~100K nessages per second in input topic, the 
kafka-streams app is consuming ~20K messages per second and in output topic the 
app is producing to ~4K messages per second. This case is the same that Case A 
but with different #kafka-brokers.

* Case D: We have 1 zookeeper server, 4 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 4 partitions each one. We have a 
synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~820K messages per second and in output topic 
the app is producing to ~100K messages per second. In this case as synthetic 
producer as kafka-streams producer have the same throughput but the 
kafka-streams consumer gets ~820K messages per second and I don't know why.

I don't understand why from Case B to Case C the consumer's throughput is lower 
when increase #partitions.

Do you like same like me?
Do I have some wrong concepts?

If you need anything else, feel free asking me for it.


was (Author: jjchorrobe):
Hi again!

We have been doing more performance tests and we have observed an anomaly 
behavior. In link below you can see all gotten information:

https://docs.google.com/spreadsheets/d/1iVKIp3vGCZKSlByaaEhM5hiA3cNA75zzNGdMNeIMO68/edit?usp=sharing

In all cases each one service has its own node. We have gotten the consumer and 
producer throughtput by JMX protocol with jconsole tool.

We have a synthetic producer that injects ~100K messages per second to 
kafka-streams application and we can seei next cases:

* Case A: We have 1 zookeeper server, 1 kafka-broker and 1 kafka-streams app. 
Also we have two topics, input and output, with 4 partitions each one. We have 
a synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~20K messages per second and in output topic the 
app is producing to ~4K messages per second. Where is others ~16K messages per 
second whether I don't observe a excesive RAM increment?

* Case B: We have 1 zookeeper server, 2 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 2 partitions each one. We have a 
synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~100K messages per second and in output topic 
the app is producing to ~100K messages per second. This case is correct!

* Case C: We have 1 zookeeper server, 2 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 4 partitions each one. We have a 
synthetic producer that injects ~100K nessages per second in input topic, the 
kafka-streams app is consuming ~20K messages per second and in output topic the 
app is producing to ~4K messages per second. This case is the same that Case A 
but with different #kafka-brokers.

* Case D: We have 1 zookeeper server, 4 kafka-brokers and 1 kafka-streams app. 
We have two topics, input and output, with 4 partitions each one. We have a 
synthetic producer that injects ~100K messages per second in input topic, the 
kafka-streams app is consuming ~820K messages per second and in output topic 
the app is producing to ~100K messages per second. In this case as synthetic 
producer as kafka-streams producer have the same throughput but the 
kafka-streams consumer gets ~820K messages per second and I don't know why.

I don't understand why from Case B to Case C the consumer's throughput is lower 
when increase #partitions.

Do you like same like me?
Do I have some wrong concepts?

If you need anything else, feel free asking me for it.

> Poor kafka-streams throughput
> -----------------------------
>
>                 Key: KAFKA-4474
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4474
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.0
>            Reporter: Juan Chorro
>            Assignee: Eno Thereska
>         Attachments: hctop sreenshot.png
>
>
> Hi! 
> I'm writing because I have a worry about kafka-streams throughput.
> I have only a kafka-streams application instance that consumes from 'input' 
> topic, prints on the screen and produces in 'output' topic. All topics have 4 
> partitions. As can be observed the topology is very simple.
> I produce 120K messages/second to 'input' topic, when I measure the 'output' 
> topic I detect that I'm receiving ~4K messages/second. I had next 
> configuration (Remaining parameters by default):
> application.id: myApp
> bootstrap.servers: localhost:9092
> zookeeper.connect: localhost:2181
> num.stream.threads: 1
> I was doing proofs and tests without success, but when I created a new 
> 'input' topic with 1 partition (Maintain 'output' topic with 4 partitions) I 
> got in 'output' topic 120K messages/seconds.
> I have been doing some performance tests and proof with next cases (All 
> topics have 4 partitions in all cases):
> Case A - 1 Instance:
> - With num.stream.threads set to 1 I had ~3785 messages/second
> - With num.stream.threads set to 2 I had ~3938 messages/second
> - With num.stream.threads set to 4 I had ~120K messages/second
> Case B - 2 Instances:
> - With num.stream.threads set to 1 I had ~3930 messages/second for each 
> instance (And throughput ~8K messages/second)
> - With num.stream.threads set to 2 I had ~3945 messages/second for each 
> instance (And more or less same throughput that with num.stream.threads set 
> to 1)
> Case C - 4 Instances
> - With num.stream.threads set to 1 I had 3946 messages/seconds for each 
> instance (And throughput ~17K messages/second):
> As can be observed when num.stream.threads is set to #partitions I have best 
> results. Then I have next questions:
> - Why whether I have a topic with #partitions > 1 and with 
> num.streams.threads is set to 1 I have ~4K messages/second always?
> - In case C. 4 instances with num.stream.threads set to 1 should be better 
> that 1 instance with num.stream.threads set to 4. Is corrects this 
> supposition?
> This is the kafka-streams application that I use: 
> https://gist.github.com/Chorro/5522ec4acd1a005eb8c9663da86f5a18



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to