[
https://issues.apache.org/jira/browse/FLINK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288467#comment-17288467
]
Piotr Nowojski edited comment on FLINK-21133 at 2/23/21, 8:54 AM:
------------------------------------------------------------------
+1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3.
and 4. are also effectively the same. Maybe trying to conclude various loose
threads that we had here. I see the following, mostly independent, issues:
a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147
(please check the discussion on the dev mailing list)
b) Unfortunately in FLINK-21132 we broke 3. (*stop-with-savepoint --drain*). In
this case, `endOfInput()` should be called (CC [~roman_khachatryan]).
Otherwise, some operators are not flushing/draining the buffered state (like
for example {{AsyncWaitOperator}}, which is doing it only in the
{{endOfInput()}} call). Note that before FLINK-21332, 3. was working correctly
only if we ignore the issue of committing side effects (two phase commit
support).
c) Changing 2., from "stop with savepoint" to "cancel with savepoint".
Previously I thought about it as a refactor/clean up AND optimisation (speed up
of the shutdown). However, as we can not used this approach for 3., I think
it's just an optimisation that would diverge the code base. For this reason I
think it would be better to postpone such optimisation after FLIP-147 is done
(if ever).
d) FLIP-27 not supporting stop with savepoint (both 3. and 4.)
was (Author: pnowojski):
+1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3.
and 4. are also effectively the same. Maybe trying to conclude various loose
threads that we had here. I see the following, mostly independent, issues:
a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147
(please check the discussion on the dev mailing list)
b) Unfortunately in FLINK-21332 we broke 3. (*stop-with-savepoint --drain*). In
this case, `endOfInput()` should be called (CC [~roman_khachatryan]).
Otherwise, some operators are not flushing/draining the buffered state (like
for example {{AsyncWaitOperator}}, which is doing it only in the
{{endOfInput()}} call). Note that before FLINK-21332, 3. was working correctly
only if we ignore the issue of committing side effects (two phase commit
support).
c) Changing 2., from "stop with savepoint" to "cancel with savepoint".
Previously I thought about it as a refactor/clean up AND optimisation (speed up
of the shutdown). However, as we can not used this approach for 3., I think
it's just an optimisation that would diverge the code base. For this reason I
think it would be better to postpone such optimisation after FLIP-147 is done
(if ever).
d) FLIP-27 not supporting stop with savepoint (both 3. and 4.)
> FLIP-27 Source does not work with synchronous savepoint
> -------------------------------------------------------
>
> Key: FLINK-21133
> URL: https://issues.apache.org/jira/browse/FLINK-21133
> Project: Flink
> Issue Type: Bug
> Components: API / Core, API / DataStream, Runtime / Checkpointing
> Affects Versions: 1.11.3, 1.12.1
> Reporter: Kezhu Wang
> Priority: Critical
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
>
> I have pushed branch
> [synchronous-savepoint-conflict-with-bounded-end-input-case|https://github.com/kezhuw/flink/commits/synchronous-savepoint-conflict-with-bounded-end-input-case]
> in my repository. {{SavepointITCase.testStopSavepointWithFlip27Source}}
> failed due to timeout.
> See also FLINK-21132 and
> [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033]..
--
This message was sent by Atlassian Jira
(v8.3.4#803005)