[
https://issues.apache.org/jira/browse/FLINK-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135862#comment-17135862
]
Yun Tang commented on FLINK-18238:
----------------------------------
[~pnowojski] Thanks for your clarification.
>From my point of view, broadcasting checkpoint barrier downside, which is our
>internal Flink did, could solve this problem. In other words, we need to
>change the order of current step-0 after step-2:
{code:java}
// Step (2): Send the checkpoint barrier downstream
operatorChain.broadcastEvent(
new CheckpointBarrier(metadata.getCheckpointId(),
metadata.getTimestamp(), options),
options.isUnalignedCheckpoint());
......
// Step (0): Record the last triggered checkpointId.
lastCheckpointId = metadata.getCheckpointId();
if (checkAndClearAbortedStatus(metadata.getCheckpointId())) {
LOG.info("Checkpoint {} has been notified as aborted, would not trigger
any checkpoint.", metadata.getCheckpointId());
return;
}
{code}
> RemoteChannelThroughputBenchmark deadlocks
> ------------------------------------------
>
> Key: FLINK-18238
> URL: https://issues.apache.org/jira/browse/FLINK-18238
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Reporter: Piotr Nowojski
> Assignee: Yingjie Cao
> Priority: Blocker
> Fix For: 1.11.0
>
> Attachments: consoleText_remote_benchmark_deadlock.txt
>
>
> In the last couple of days
> {{RemoteChannelThroughputBenchmark.remoteRebalance}} deadlocked for the
> second time:
> http://codespeed.dak8s.net:8080/job/flink-master-benchmarks/6019/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)