[
https://issues.apache.org/jira/browse/KAFKA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-12256.
-----------------------------------
Fix Version/s: 3.2.0
Resolution: Fixed
> auto commit causes delays due to retriable UNKNOWN_TOPIC_OR_PARTITION
> ---------------------------------------------------------------------
>
> Key: KAFKA-12256
> URL: https://issues.apache.org/jira/browse/KAFKA-12256
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 2.0.0
> Reporter: Ryan Leslie
> Priority: Minor
> Labels: new-consumer-threading-should-fix
> Fix For: 3.2.0
>
>
> In KAFKA-6829 a change was made to the consumer to internally retry commits
> upon receiving UNKNOWN_TOPIC_OR_PARTITION.
> Though this helped mitigate issues around stale broker metadata, there were
> some valid concerns around the negative effects for routine topic deletion:
> https://github.com/apache/kafka/pull/4948
> In particular, if a commit is issued for a deleted topic, retries can block
> the consumer for up to max.poll.interval.ms. This is tunable of course, but
> any amount of stalling in a consumer can lead to unnecessary lag.
> One of the assumptions while permitting the change was that in practice it
> should be rare for commits to occur for deleted topics, since that would
> imply messages were being read or published at the time of deletion. It's
> fair to expect users to not delete topics that are actively published to. But
> this assumption is false in cases where auto commit is enabled.
> With the current implementation of auto commit, the consumer will regularly
> issue commits for all topics being fetched from, regardless of whether or not
> messages were actually received. The fetch positions are simply flushed, even
> when they are 0. This is simple and generally efficient, though it does mean
> commits are often redundant. Besides the auto commit interval, commits are
> also issued at the time of rebalance, which is often precisely at the time
> topics are deleted.
> This means that in practice commits for deleted topics are not really rare.
> This is particularly an issue when the consumer is subscribed to a multitude
> of topics using a wildcard. For example, a consumer might subscribe to a
> particular "flavor" of topic with the aim of auditing all such data, and
> these topics might dynamically come and go. The consumer's metadata and
> rebalance mechanisms are meant to handle this gracefully, but the end result
> is that such groups are often blocked in a commit for several seconds or
> minutes (the default is 5 minutes) whenever a delete occurs. This can
> sometimes result in significant lag.
> Besides having users abandon auto commit in the face of topic deletes, there
> are probably multiple ways to deal with this, including reconsidering if
> commits still truly need to be retried here, or if this behavior should be
> more configurable; e.g. having a separate commit timeout or policy. In some
> cases the loss of a commit and subsequent message duplication is still
> preferred to processing delays. And having an artificially low
> max.poll.interval.ms or rebalance.timeout.ms comes with its own set of
> concerns.
> In the very least the current behavior and pitfalls around delete with active
> consumers should be documented.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)