Hi, devs & users
Very sorry for the spoiled formats, I resent the discussion as follows.
As discussed in FLIP-131[1], Flink will make DataStream the unified API for
processing bounded and unbounded data in both streaming and blocking modes.
However, one long-standing problem for the streaming mode is that currently
Flink does not support checkpoints after some tasks finished, which causes
some problems for bounded or mixed jobs:
1. Flink exactly-once sinks rely on checkpoints to ensure data won’t be
replayed before committed to external systems in streaming mode. If sources are
bounded and checkpoints are disabled after some tasks are finished, the data
sent after the last checkpoint would always not be able to be committed. This
issue has already been reported some times in the user ML[2][3][4] and is
future brought up when working on FLIP-143: Unified Sink API [5].
2. The jobs with both bounded and unbounded sources might have to
replay a large amount of records after failover due to no periodic checkpoints
are taken after the bounded sources finished.
Therefore, we propose to also support checkpoints after some tasks finished.
Your Could find more details in FLIP-147[6].
Best,
Yun
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[2]
https://lists.apache.org/thread.html/rea1ac2d82f646fcea1395b5738be495f144c5b0312a290a1d4a339c1%40%3Cuser.flink.apache.org%3E
[3]
https://lists.apache.org/thread.html/rad4adeec838093b8b56ae9e2ea6a937a4b1882b53045a12acb7e61ea%40%3Cuser.flink.apache.org%3E
[4]
https://lists.apache.org/thread.html/4cf28a9fa3732dfdd9e673da6233c5288ca80b20d58cee130bf1c141%40%3Cuser.flink.apache.org%3E
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished
------------------Original Mail ------------------
Sender:Yun Gao <[email protected]>
Send Date:Fri Oct 9 14:16:52 2020
Recipients:Flink Dev <[email protected]>, User-Flink <[email protected]>
Subject:[DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi, devs & users
As discussed in FLIP-131 [1], Flink will make DataStream the unified API for
processing bounded and unbounded data in both streaming and blocking modes.
However, one long-standing problem for the streaming mode is that currently
Flink does not support checkpoints after some tasks finished, which causes some
problems for bounded or mixed jobs:
Flink exactly-once sinks rely on checkpoints to ensure data won’t be replayed
before committed to external systems in streaming mode. If sources are bounded
and checkpoints are disabled after some tasks are finished, the data sent after
the last checkpoint would always not be able to be committed. This issue has
already been reported some times in the user ML[2][3][4] and is future brought
up when working on FLIP-143: Unified Sink API [5].
The jobs with both bounded and unbounded sources might have to replay a large
amount of records after failover due to no periodic checkpoints are taken after
the bounded sources finished.
Therefore, we propose to also support checkpoints after some tasks finished.
Your Could find more details in FLIP-147[6].
Best,
Yun
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[2]
https://lists.apache.org/thread.html/rea1ac2d82f646fcea1395b5738be495f144c5b0312a290a1d4a339c1%40%3Cuser.flink.apache.org%3E
[3]
https://lists.apache.org/thread.html/rad4adeec838093b8b56ae9e2ea6a937a4b1882b53045a12acb7e61ea%40%3Cuser.flink.apache.org%3E
[4]
https://lists.apache.org/thread.html/4cf28a9fa3732dfdd9e673da6233c5288ca80b20d58cee130bf1c141%40%3Cuser.flink.apache.org%3E
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished