[
https://issues.apache.org/jira/browse/FLINK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384910#comment-17384910
]
Piotr Nowojski commented on FLINK-21080:
----------------------------------------
{quote}
I think this issue indeed exists: with union list state it may rely on a single
task to write the value to implement a broadcast state pattern, but if this
task is finished, this piece of information is lost.
{quote}
A note. {{BroadcastState}} I think is something else compared to the
{{UnionListState}}, is it? Also with {{UnionListState}} it doesn't necessary
must only single {{subtaskId == 0}} subtask writing/updating the state. There
might be cases were there are many writers.
{quote}
It seems that the probability of we met this case (an operator use union list
state and part of subtask is finished, and we just take a checkpoint) would be
not too high
{quote}
{{FlinkKafkaProducer}} is using {{UnionListState}}. So it wouldn't be as un
common as you think?
Let's see if maybe others will come with a different proposals. Also it would
be nice to estimate complexity of supporting finished subtasks with
{{UnionListState}}, but I would lean towards an easy stop-gap solution
(aborting checkpoints if {{UnionListState}} is detected with finished subtasks)
to unblock this FLIP, and maybe improve it incrementally.
> Identify JobVertex containing legacy source operators and abort checkpoint
> with legacy source operators partially finished
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-21080
> URL: https://issues.apache.org/jira/browse/FLINK-21080
> Project: Flink
> Issue Type: Sub-task
> Components: API / DataStream, Runtime / Checkpointing
> Reporter: Yun Gao
> Assignee: Yun Gao
> Priority: Major
> Labels: auto-unassigned
>
> Most legacy source operators would record the offset for each partitions, and
> after recovery it would read from the recorded offset. If before a checkpoint
> some subtasks are finished, the corresponding partition offsets would be
> deserted in the checkpoint. Then if the job recover with this checkpoint, the
> legacy source would re-discovery all the partitions and for those finished
> tasks, the legacy source would re-read them since their offsets are not
> recorded.
> Therefore, we would like to fail the checkpoint if some legacy source
> operators have part of subtasks finished.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)