[
https://issues.apache.org/jira/browse/FLINK-19385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201386#comment-17201386
]
Arvid Heise commented on FLINK-19385:
-------------------------------------
Since this issue would only occur for unaligned checkpoints and input
selection, which we currently do not support. I'd remove the 1.11 version tag
and skip a backport.
> Channel recovery may deadlock
> -----------------------------
>
> Key: FLINK-19385
> URL: https://issues.apache.org/jira/browse/FLINK-19385
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network, Runtime / Task
> Affects Versions: 1.12.0, 1.11.2
> Reporter: Roman Khachatryan
> Assignee: Roman Khachatryan
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Consider the following case:
> * Two IntputGates
> * Input selection is not ALL (say FIRST initially)
> * Unaligned Checkpoints ON
> * on recovery, there are "parts" of records in all channels (actually 1 is
> enough I think)
> What happens:
> # StreamTask initiates recovery and scedule partition request upon it's end
> # All gates and channels will receive buffers from StateReader
> # All channels of a single gate will consume those state buffers -
> completing that gate's StateConsumedFuture
> # InputProcessor will return NOTHING_AVAILABLE (see
> StreamTwoInputProcessor.getInputStatus)
> # StreamTask will suspend its default action
> # State of the 2nd gate won't be consumed - so its StateConsumedFutures
> won't be completed - so no partitions will be requested
> Solution: request partitions independently for each channel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)