[
https://issues.apache.org/jira/browse/FLINK-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884836#comment-16884836
]
Jiayi Liao commented on FLINK-10806:
------------------------------------
[~becket_qin] Hi Jiangjie, Thank you for your attention on this :).
This is about the "addDiscoveredPartitions" function in AbstractFetcher.java.
We consume new KafkaTopicPartition from earliest offset, which may not be what
we want sometimes. Assuming that we add a new topic into the topic list of
FlinkKafkaConsumer08 which is the source of the application. And this topic
will be found by PartitionDiscoverer and consumed from the earliest offset,
which is not good if the topic has been in production for a long time(means
that large history data).
I'm not sure whether this is a common case or not. I'm willing to hear your
opinions.
> Support multiple consuming offsets when discovering a new topic
> ---------------------------------------------------------------
>
> Key: FLINK-10806
> URL: https://issues.apache.org/jira/browse/FLINK-10806
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Kafka
> Affects Versions: 1.6.2
> Reporter: Jiayi Liao
> Assignee: Jiayi Liao
> Priority: Major
>
> In KafkaConsumerBase, we discover the TopicPartitions and compare them with
> the restoredState. It's reasonable when a topic's partitions scaled. However,
> if we add a new topic which has too much data and restore the Flink program,
> the data of the new topic will be consumed from the start, which may not be
> what we want. I think this should be an option for developers.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)