[ 
https://issues.apache.org/jira/browse/FLINK-23527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-23527:
---------------------------------
    Fix Version/s: 1.13.3

> Clarify `SourceFunction#cancel()` contract about interrupting
> -------------------------------------------------------------
>
>                 Key: FLINK-23527
>                 URL: https://issues.apache.org/jira/browse/FLINK-23527
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Documentation
>    Affects Versions: 1.13.1
>            Reporter: Piotr Nowojski
>            Assignee: Stephan Ewen
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.14.0, 1.13.3
>
>
> We should clarify the contract of {{SourceFunction#cancel()}}
> # source itself shouldn’t be interrupting the source thread
> # interrupt shouldn’t be expected in the clean cancellation case
> Interrupting the code on the clean shutdown path can cause failures when 
> doing `stop-with-savepoint`. If source thread is interrupted during 
> backpressure, this leaves network stack in invalid state, making it 
> impossible to send {{EndOfPartitionEvent}} to complete the shutdown.
> In a bit more detail, when source thread is backpressured, network stack 
> might have already sent a partial record and it could be waiting for a buffer 
> to finish writing/serialising that record. If network stack is interrupted 
> while waiting for that buffer, it will never resume writing/serialisation of 
> the remaining part of that record, while downstream node will be expecting 
> those bytes. If in this situation we attempt to emit anything (like 
> {{EndOfPartitionEvent}}), this will most likely cause deserialisation errors 
> on the downstream nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to