[
https://issues.apache.org/jira/browse/BEAM-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090856#comment-17090856
]
Chamikara Madhusanka Jayalath commented on BEAM-8121:
-----------------------------------------------------
Alexey and TJ, I think we might want to update source to better parallelize
when there's only one subscription or when there's only a limited number of
subscribers.
Using Reshuffle might work for some customers but reading will still be limited
to one worker when there's only one subscription.
We might be able to parallelize more by splitting the offsets sequence between
concurrent workers.
> Messages are not distributed per machines when consuming from Kafka topic
> with 1 partition
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-8121
> URL: https://issues.apache.org/jira/browse/BEAM-8121
> Project: Beam
> Issue Type: Bug
> Components: io-java-kafka
> Affects Versions: 2.14.0
> Reporter: TJ
> Assignee: Alexey Romanenko
> Priority: Major
> Fix For: Not applicable
>
> Attachments: datalake-dataflow-cleaned.zip
>
>
> Messages are consumed from Kafka using KafkaIO. Each kafka topic contains
> only 1 partition. (That means that messages can be consumed only by one
> Consumer per 1 consumer group)
> When backlog of topic grows and system scales from 1 to X machines, all the
> messages seems to be executed onĀ the same machine on which they are read.
> Due to that message throughput doesn't increase comparing X machines to 1
> machine. If one machine was reading 2K messagesĀ per s, X machines will be
> reading the same amount.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)