[
https://issues.apache.org/jira/browse/KAFKA-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060358#comment-18060358
]
fujian edited comment on KAFKA-20035 at 2/23/26 2:33 PM:
---------------------------------------------------------
To be honest, I see this more as a bug rather than an issue related to which
{{auto.offset.reset}} strategy should be chosen. Expanding partitions is a
normal operational activity, and regardless of the configured strategy, it
should not result in partial message loss.
In fact, precisely because of this issue, we have had to configure
{{auto.offset.reset}} to {{earliest}} in our production environment. When we
set it to {{{}latest{}}}, our expectation is simply that consumers do not need
to process historical data at startup. We also aware that we may lost some data
if the commit offset missing for some special reason (example: KAFKA-19902).
However, we certainly do not expect data loss during normal operations, such as
when partitions are expanded.
For these reasons, I think this issue is one bug which still worth addressing.
As for the specific approach to the fix, that is, of course, a separate
discussion.
The above is just my personal perspective as a user, for your reference only.
Thanks
was (Author: fujian1115):
To be honest, I see this more as a bug rather than an issue related to which
{{auto.offset.reset}} strategy should be chosen. Expanding partitions is a
normal operational activity, and regardless of the configured strategy, it
should not result in partial message loss.
In fact, precisely because of this issue, we have had to configure
{{auto.offset.reset}} to {{earliest}} in our production environment. When we
set it to {{{}latest{}}}, our expectation is simply that consumers do not need
to process historical data at startup. We also aware that we may lost some data
if the commit offset missing for some special reason (example: KAFKA-19902).
However, we certainly do not expect data loss during normal operations, such as
when partitions are expanded.
For these reasons, I think this issue is worth addressing. As for the specific
approach to the fix, that is, of course, a separate discussion.
The above is just my perspective as a user, for your reference only. Thanks
> Prevent data loss during partition expansion by enforcing "earliest" offset
> reset for dynamically added partitions
> ------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20035
> URL: https://issues.apache.org/jira/browse/KAFKA-20035
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer, core, group-coordinator
> Reporter: Chia-Ping Tsai
> Assignee: Ken Huang
> Priority: Critical
> Labels: kip
>
> Currently, when a consumer group is configured with {{{}auto.offset.reset =
> latest{}}}, dynamically adding new partitions to a subscribed topic can lead
> to data loss due to a race condition.
> The scenario is as follows:
> # A group subscribes to a topic with {{{}auto.offset.reset = latest{}}}.
> # The topic is expanded (e.g., from 3 to 4 partitions).
> # Producers immediately start writing data to the new partition (Partition
> 3).
> # The Group Coordinator detects the change and assigns Partition 3 to a
> member.
> # The member initializes the partition. Since there is no committed offset,
> it applies the
> # *Result: Any messages written to Partition 3 between step 3 and step 5 are
> skipped and lost.*
> From a user's perspective, {{latest}} should mean "start consuming from the
> point of subscription," not "skip data from newly created infrastructure."
> KIP-1282:
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619800]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)