[
https://issues.apache.org/jira/browse/FLINK-22368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332562#comment-17332562
]
Roman Khachatryan commented on FLINK-22368:
-------------------------------------------
I also see that the job is being stuck because one of the tasks transitions
from RUNNING to CANCELLED (instead of FINISHED).
This happens because RemoteInputChannel is polled after it was released. In
such a case RemoteInputChannel
[throws|https://github.com/apache/flink/blob/e905db9e20950fc605350fad007b1c3e4f09de91/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java#L217]
CancelTaskException which triggers task cancellation. Cancellation prevents
EndOfPartition event from being propagated and therefore downstream tasks keep
running.
I've published a PR to prevent enqueing "released" channel (on receive); and to
validate that the channel hasn't receive EoP (on poll).
> UnalignedCheckpointITCase hangs on azure
> ----------------------------------------
>
> Key: FLINK-22368
> URL: https://issues.apache.org/jira/browse/FLINK-22368
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Affects Versions: 1.13.0
> Reporter: Dawid Wysakowicz
> Assignee: Roman Khachatryan
> Priority: Blocker
> Labels: pull-request-available, test-stability
> Fix For: 1.13.1
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16818&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10144
--
This message was sent by Atlassian Jira
(v8.3.4#803005)