[jira] [Commented] (FLINK-21080) Identify JobVertex containing legacy source operators and abort checkpoint with legacy source operators partially finished

Yun Gao (Jira) Mon, 19 Jul 2021 23:56:16 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383817#comment-17383817
 ]


Yun Gao commented on FLINK-21080:
---------------------------------

Some more notes:

# This is mainly for legacy source use union list state to maintains the offset 
for each split, and do re-discovery on restoring. 
# For other legacy sources, it would not cause new problem.
# An exception is the legacy continuous file source, it composes of two 
operators, a source function to do periodic discovery and an operator to 
process each split. We should not need to specially deal with the 
ContinuousFileReaderOperator since if a split is discovered and assigned and 
snapshotted, then the split won't be added again since its time is not greater 
than the last discovery time snapshotted. 
# New sources should not need the fix, it usually either keep the unassigned 
splits and when restored, it would not re-discover and directly use the 
restored unassigned splits, or it would keep the assigned / processed splits, 
and always filter the processed ones when re-discovering. It is expected that 
the following new sources should also follow this paradigm and support 
checkpoints after tasks finished. 


> Identify JobVertex containing legacy source operators and abort checkpoint 
> with legacy source operators partially finished
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-21080
>                 URL: https://issues.apache.org/jira/browse/FLINK-21080
>             Project: Flink
>          Issue Type: Sub-task
>          Components: API / DataStream, Runtime / Checkpointing
>            Reporter: Yun Gao
>            Assignee: Yun Gao
>            Priority: Major
>              Labels: auto-unassigned
>
> Most legacy source operators would record the offset for each partitions, and 
> after recovery it would read from the recorded offset. If before a checkpoint 
> some subtasks are finished, the corresponding partition offsets would be 
> deserted in the checkpoint. Then if the job recover with this checkpoint, the 
> legacy source would re-discovery all the partitions and for those finished 
> tasks, the legacy source would re-read them since their offsets are not 
> recorded. 
> Therefore, we would like to fail the checkpoint if some legacy source 
> operators have part of subtasks finished. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21080) Identify JobVertex containing legacy source operators and abort checkpoint with legacy source operators partially finished

Reply via email to