[ 
https://issues.apache.org/jira/browse/FLINK-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135862#comment-17135862
 ] 

Yun Tang commented on FLINK-18238:
----------------------------------

[~pnowojski] Thanks for your clarification.

>From my point of view, broadcasting checkpoint barrier downside, which is our 
>internal Flink did, could solve this problem. In other words, we need to 
>change the order of current step-0 after step-2:

{code:java}
// Step (2): Send the checkpoint barrier downstream
operatorChain.broadcastEvent(
        new CheckpointBarrier(metadata.getCheckpointId(), 
metadata.getTimestamp(), options),
        options.isUnalignedCheckpoint());

......

// Step (0): Record the last triggered checkpointId.
lastCheckpointId = metadata.getCheckpointId();
if (checkAndClearAbortedStatus(metadata.getCheckpointId())) {
        LOG.info("Checkpoint {} has been notified as aborted, would not trigger 
any checkpoint.", metadata.getCheckpointId());
        return;
}
{code}



> RemoteChannelThroughputBenchmark deadlocks
> ------------------------------------------
>
>                 Key: FLINK-18238
>                 URL: https://issues.apache.org/jira/browse/FLINK-18238
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>            Reporter: Piotr Nowojski
>            Assignee: Yingjie Cao
>            Priority: Blocker
>             Fix For: 1.11.0
>
>         Attachments: consoleText_remote_benchmark_deadlock.txt
>
>
> In the last couple of days 
> {{RemoteChannelThroughputBenchmark.remoteRebalance}} deadlocked for the 
> second time:
> http://codespeed.dak8s.net:8080/job/flink-master-benchmarks/6019/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to