[
https://issues.apache.org/jira/browse/FLINK-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136511#comment-17136511
]
Piotr Nowojski commented on FLINK-18238:
----------------------------------------
Copying the result of an online discussion: we decided to go with broadcasting
checkpoint cancellation markers from
{{SubtaskCheckpointCoordinatorImpl#checkpointState}} in the case when
{{notifyCheckpointAborted}} RPC call was received before it the checkpoint was
triggered. This guarantees that downstream tasks will always eventually stop
the alignment.
We could further optimise the process by cancelling the ongoing alignment of
the task, once it receives {{notifyCheckpointAborted}} RPC, but that would
require some more extensive changes that we do not need to do right now.
> RemoteChannelThroughputBenchmark deadlocks
> ------------------------------------------
>
> Key: FLINK-18238
> URL: https://issues.apache.org/jira/browse/FLINK-18238
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Reporter: Piotr Nowojski
> Assignee: Yingjie Cao
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.11.0
>
> Attachments: consoleText_remote_benchmark_deadlock.txt
>
>
> In the last couple of days
> {{RemoteChannelThroughputBenchmark.remoteRebalance}} deadlocked for the
> second time:
> http://codespeed.dak8s.net:8080/job/flink-master-benchmarks/6019/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)