[
https://issues.apache.org/jira/browse/FLINK-32019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-32019:
-----------------------------------
Labels: pull-request-available (was: )
> EARLIEST offset strategy for partitions discoveried later based on FLIP-288
> ---------------------------------------------------------------------------
>
> Key: FLINK-32019
> URL: https://issues.apache.org/jira/browse/FLINK-32019
> Project: Flink
> Issue Type: Sub-task
> Components: Connectors / Kafka
> Affects Versions: kafka-3.0.0
> Reporter: Hongshun Wang
> Assignee: Hongshun Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: kafka-4.0.0
>
>
> As described in
> [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source],
> the strategy used for new partitions is the same as the initial offset
> strategy, which is not reasonable.
> According to the semantics, if the startup strategy is latest, the consumed
> data should include all data from the moment of startup, which also includes
> all messages from new created partitions. However, the latest strategy
> currently maybe used for new partitions, leading to the loss of some data
> (thinking a new partition is created and might be discovered by Kafka source
> several minutes later, and the message produced into the partition within the
> gap might be dropped if we use for example "latest" as the initial offset
> strategy).if the data from all new partitions is not read, it does not meet
> the user's expectations.
> Other ploblems see final Section of
> [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]:
> {{User specifies OffsetsInitializer for new partition}} .
> Therefore, it’s better to provide an *EARLIEST* strategy for later discovered
> partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)