[
https://issues.apache.org/jira/browse/BEAM-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920886#comment-16920886
]
TJ edited comment on BEAM-8121 at 9/2/19 2:03 PM:
--------------------------------------------------
I use Dataflow runner.
I tried Reshuffle.viaRandomKey(), it worked quite poorly and had side effects:
* Pipeline started to have constant lag (2-3 min) on Reshuffle step.
* CPU usage increased from 20-30% to almost 80%. On n1-standard-4
([https://cloud.google.com/compute/docs/machine-types]).
* With single machine it was consuming less messages. 1 machine without
reshuffle step ±1500message/s, 1 machine with reshuffle step ±450message/s.
When it scaled to 5 it was consuming the same 1500message/s as without the step.
I approached google support and they advised do not use Reshuffle step at all.
Cause it's deprecated and not stable.
I assume if I would use more powerful machines and the higher number of them as
5 it would theoretically reach the max throughput, but it would be really
costly for such small message throughput.
One more thing to mention. I did the throughput test for Kafka Standalone app
and Dataflow pipeline. They were consuming from the same topic with 1
partition. Message size 1.3KB. Results:
Standalone kafka client was consuming 5000 message/s
Dataflow pipeline with few empty tasks 3200message/s
Dataflow pipeline with normal tasks what manipulates data 1500message/s
was (Author: taujan):
I use Dataflow runner.
I tried Reshuffle.viaRandomKey(), it worked quite poorly and had side effects:
* Pipeline started to have constant lag (2-3 min) on Reshuffle step.
* CPU usage increased from 20-30% to almost 80%. On n1-standard-4
([https://cloud.google.com/compute/docs/machine-types]).
* With single machine it was consuming less messages. 1 machine without
reshuffle step ±1500message/s, 1 machine with reshuffle step ±450message/s.
When it scaled to 5 it was consuming the same 1500message/s as without the step.
I approached google support and they advised do not use Reshuffle step at all.
Cause it's deprecated and not stable.
One more thing to mention. I did the throughput test for Kafka Standalone app
and Dataflow pipeline. They were consuming from the same topic with 1
partition. Results:
Standalone kafka client was consuming 4500 message/s
Dataflow pipeline with few empty tasks 3200message/s
Dataflow pipeline with normal tasks what manipulates data 1500message/s
> Messages are not distributed per machines when consuming from Kafka topic
> with 1 partition
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-8121
> URL: https://issues.apache.org/jira/browse/BEAM-8121
> Project: Beam
> Issue Type: Bug
> Components: io-java-kafka
> Affects Versions: 2.14.0
> Reporter: TJ
> Priority: Major
>
> Messages are consumed from Kafka using KafkaIO. Each kafka topic contains
> only 1 partition. (That means that messages can be consumed only by one
> Consumer per 1 consumer group)
> When backlog of topic grows and system scales from 1 to X machines, all the
> messages seems to be executed on the same machine on which they are read.
> Due to that message throughput doesn't increase comparing X machines to 1
> machine. If one machine was reading 2K messages per s, X machines will be
> reading the same amount.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)