[
https://issues.apache.org/jira/browse/FLINK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383817#comment-17383817
]
Yun Gao commented on FLINK-21080:
---------------------------------
Some more notes:
# This is mainly for legacy source use union list state to maintains the offset
for each split, and do re-discovery on restoring.
# For other legacy sources, it would not cause new problem.
# An exception is the legacy continuous file source, it composes of two
operators, a source function to do periodic discovery and an operator to
process each split. We should not need to specially deal with the
ContinuousFileReaderOperator since if a split is discovered and assigned and
snapshotted, then the split won't be added again since its time is not greater
than the last discovery time snapshotted.
# New sources should not need the fix, it usually either keep the unassigned
splits and when restored, it would not re-discover and directly use the
restored unassigned splits, or it would keep the assigned / processed splits,
and always filter the processed ones when re-discovering. It is expected that
the following new sources should also follow this paradigm and support
checkpoints after tasks finished.
> Identify JobVertex containing legacy source operators and abort checkpoint
> with legacy source operators partially finished
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-21080
> URL: https://issues.apache.org/jira/browse/FLINK-21080
> Project: Flink
> Issue Type: Sub-task
> Components: API / DataStream, Runtime / Checkpointing
> Reporter: Yun Gao
> Assignee: Yun Gao
> Priority: Major
> Labels: auto-unassigned
>
> Most legacy source operators would record the offset for each partitions, and
> after recovery it would read from the recorded offset. If before a checkpoint
> some subtasks are finished, the corresponding partition offsets would be
> deserted in the checkpoint. Then if the job recover with this checkpoint, the
> legacy source would re-discovery all the partitions and for those finished
> tasks, the legacy source would re-read them since their offsets are not
> recorded.
> Therefore, we would like to fail the checkpoint if some legacy source
> operators have part of subtasks finished.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)