[
https://issues.apache.org/jira/browse/FLINK-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-6790:
----------------------------------
Labels: auto-deprioritized-major auto-deprioritized-minor (was:
auto-deprioritized-major stale-minor)
Priority: Not a Priority (was: Minor)
This issue was labeled "stale-minor" 7 days ago and has not received any
updates so it is being deprioritized. If this ticket is actually Minor, please
raise the priority and ask a committer to assign you the issue or revive the
public discussion.
> Flink Kafka Consumer Cannot Round Robin Fetch Records
> -----------------------------------------------------
>
> Key: FLINK-6790
> URL: https://issues.apache.org/jira/browse/FLINK-6790
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Kafka
> Affects Versions: 1.2.1, 1.3.0
> Reporter: xingHu
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> The Java consumer fails consume messages in a round robin fashion. This can
> lead to an unbalance consumption.
> In our use case we have a set of consumer that can take a significant amount
> of time consuming messages off a topic. For this reason, we are using the
> pause/poll/resume pattern to ensure the consumer session is not timeout. The
> topic that is being consumed has been preloaded with message. That means
> there is a significant message lag when the consumer is first started. To
> limit how many messages are consumed at a time, the consumer has been
> configured with max.poll.records=1.
> The first initial observation is that the client receive a large batch of
> messages for the first partition it decides to consume from and will consume
> all those messages before moving on, rather than returning a message from a
> different partition for each call to poll.
> We solved this issue by configuring max.partition.fetch.bytes to be small
> enough that only a single message will be returned by the broker on each
> fetch, although this would not be feasible if message size were highly
> variable.
> The behavior of the consumer after this change is to largely consume from a
> small number of partitions, usually just two, iterating between them, until
> it exhausts them, before moving to another partition. This behavior is
> problematic if the messages have some rough time semantics and need to be
> process roughly time ordered across all partitions.
> It would be useful if source has a pluggable API that allowed custom logic to
> select which partition to consume from next, thus enabling the creation of a
> round robin partition consumer.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)