[ 
https://issues.apache.org/jira/browse/FLINK-22368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332562#comment-17332562
 ] 

Roman Khachatryan commented on FLINK-22368:
-------------------------------------------

I also see that the job is being stuck because one of the tasks transitions 
from RUNNING to CANCELLED (instead of FINISHED). 
 This happens because RemoteInputChannel is polled after it was released. In 
such a case RemoteInputChannel 
[throws|https://github.com/apache/flink/blob/e905db9e20950fc605350fad007b1c3e4f09de91/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java#L217]
 CancelTaskException which triggers task cancellation. Cancellation prevents 
EndOfPartition event from being propagated and therefore downstream tasks keep 
running.

 

I've published a PR to prevent enqueing "released" channel (on receive); and to 
validate that the channel hasn't receive EoP (on poll).

> UnalignedCheckpointITCase hangs on azure
> ----------------------------------------
>
>                 Key: FLINK-22368
>                 URL: https://issues.apache.org/jira/browse/FLINK-22368
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.13.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: pull-request-available, test-stability
>             Fix For: 1.13.1
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16818&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10144



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to