RivenSun created KAFKA-15185:
--------------------------------
Summary: Consumers using the latest strategy may lose data after
the topic adds partitions
Key: KAFKA-15185
URL: https://issues.apache.org/jira/browse/KAFKA-15185
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 3.4.1
Reporter: RivenSun
Assignee: Luke Chen
h2. condition:
1. Business topic adds partition
2. The configuration metadata.max.age.ms of producers and consumers is set to
five minutes.
But the producer discovered the new partition before the consumer, and
generated 100 messages to the new partition.
3. The consumer parameter auto.offset.reset is set to latest
h2. result:
Consumers will lose these 100 messages
First of all we cannot directly set auto.offset.reset to {*}earliest{*}.
Because the user's demand is that a newly subscribed group can discard all old
messages of the topic.
However, after the group is subscribed, the message generated by the expanded
partition must be guaranteed not to be lost, similar to starting consumption
from the earliest.
h2.
suggestion:
So we have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the
producer's configuration.
But this still can't solve the problem, because in many cases, the producer may
force refresh the metadata.
Secondly, a smaller metadata.max.age.ms value will bring more metadata refresh
requests, which will increase the burden on the broker.
So can we add a parameter to control how the consumer determines whether to
start consumption from the earliest or latest for the newly added partition.
Perhaps during the rebalance process, the leaderConsumer needs to mark which
partitions are newly added when calculating the assignment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)