[
https://issues.apache.org/jira/browse/FLINK-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889876#comment-16889876
]
Jiayi Liao commented on FLINK-10806:
------------------------------------
[~becket_qin] Firstly it's not only for a newly found partition but also newly
topic. Assuming that I've already have a streaming pipeline for processing the
data which are collected from user behaviours on web browsers. Now I want to
add a new topic(user behaviours on mobile devices) into this pipeline, but this
topic has existed for a while and already been used in other pipelines. So I
stop the pipeline and add the topic into the KafkaConsumer topic lists and
restart this pipeline, which causes that the data of this topic is consumed
from earliest offset because it's a newly found TopicAndPartition. I think It's
reasonable that we're able to consume from the latest offset of this newly
topic.
> Support multiple consuming offsets when discovering a new topic
> ---------------------------------------------------------------
>
> Key: FLINK-10806
> URL: https://issues.apache.org/jira/browse/FLINK-10806
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Kafka
> Affects Versions: 1.6.2, 1.8.1
> Reporter: Jiayi Liao
> Assignee: Jiayi Liao
> Priority: Major
>
> In KafkaConsumerBase, we discover the TopicPartitions and compare them with
> the restoredState. It's reasonable when a topic's partitions scaled. However,
> if we add a new topic which has too much data and restore the Flink program,
> the data of the new topic will be consumed from the start, which may not be
> what we want. I think this should be an option for developers.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)