Richard Yu created KAFKA-9285:
---------------------------------
Summary: Implement failed message topic to account for processing
lag during failure
Key: KAFKA-9285
URL: https://issues.apache.org/jira/browse/KAFKA-9285
Project: Kafka
Issue Type: New Feature
Components: consumer
Reporter: Richard Yu
Presently, in current Kafka failure schematics, when a consumer crashes, the
user is typically responsible for both detecting as well as restarting the
failed consumer. Therefore, during this period of time, when the consumer is
dead, it would result in a period of inactivity where no records are consumed,
hence lag results. Previously, there has been attempts to resolve this problem:
when failure is detected by broker, a substitute consumer will be started (the
so-called [Rebalance
Consumer|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-333%3A+Add+faster+mode+of+rebalancing]])
which will continue processing records in Kafka's stead.
However, this has complications, as records will only be stored locally, and in
case of this consumer failing as well, that data will be lost. Instead, we need
to consider how we can still process these records and at the same time
effectively _persist_ them. It is here that I propose the concept of a _failed
message topic._ At a high level, it works like this. When we find that a
consumer has failed, messages which was originally meant to be sent to that
consumer would be redirected to this failed messaged topic. The user can choose
to assign consumers to this topic, which would consume messages from failed
consumers while other consumer threads are down.
Naturally, records from different topics can not go into the same failed
message topic, since we cannot tell which records belong to which consumer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)