[ 
https://issues.apache.org/jira/browse/FLINK-23527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-23527:
-----------------------------------
    Description: 
We should clarify the contract of {{SourceFunction#cancel()}}

# source itself shouldn’t be interrupting the source thread
# interrupt shouldn’t be expected in the clean cancellation case

Interrupting the code on the clean shutdown path can cause failures when doing 
`stop-with-savepoint`. If source thread is interrupted during backpressure, 
this leaves network stack in invalid state, making it impossible to send 
{{EndOfPartitionEvent}} to complete the shutdown.

In a bit more detail, when source thread is backpressured, network stack might 
have already sent a partial record and it could be waiting for a buffer to 
finish writing/serialising that record. If network stack is interrupted while 
waiting for that buffer, it will never resume writing/serialisation of the 
remaining part of that record, while downstream node will be expecting those 
bytes. If in this situation we attempt to emit anything (like 
{{EndOfPartitionEvent}}), this will most likely cause deserialisation errors on 
the downstream nodes.

  was:
We should clarify the contract of {{SourceFunction#cancel()}}

# source itself shouldn’t be interrupting the source thread
# interrupt shouldn’t be expected in the clean cancellation case

Interrupting the code on the clean shutdown path can cause failures when doing 
`stop-with-savepoint`. If source thread is interrupted during backpressure, 
this leaves network stack in invalid state, making it impossible to send 
{{EndOfPartitionEvent}} to complete the shutdown.


> Clarify `SourceFunction#cancel()` contract about interrupting
> -------------------------------------------------------------
>
>                 Key: FLINK-23527
>                 URL: https://issues.apache.org/jira/browse/FLINK-23527
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.13.1
>            Reporter: Piotr Nowojski
>            Priority: Critical
>             Fix For: 1.14.0
>
>
> We should clarify the contract of {{SourceFunction#cancel()}}
> # source itself shouldn’t be interrupting the source thread
> # interrupt shouldn’t be expected in the clean cancellation case
> Interrupting the code on the clean shutdown path can cause failures when 
> doing `stop-with-savepoint`. If source thread is interrupted during 
> backpressure, this leaves network stack in invalid state, making it 
> impossible to send {{EndOfPartitionEvent}} to complete the shutdown.
> In a bit more detail, when source thread is backpressured, network stack 
> might have already sent a partial record and it could be waiting for a buffer 
> to finish writing/serialising that record. If network stack is interrupted 
> while waiting for that buffer, it will never resume writing/serialisation of 
> the remaining part of that record, while downstream node will be expecting 
> those bytes. If in this situation we attempt to emit anything (like 
> {{EndOfPartitionEvent}}), this will most likely cause deserialisation errors 
> on the downstream nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to