[ 
https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979969#comment-16979969
 ] 

Piotr Nowojski commented on FLINK-14872:
----------------------------------------

[~kevin.cyj] let me think about it a bit more. 

Side note, please do not confuse floating buffers (credit based flow control) 
and optional buffers. Floating buffers can float within for example 
{{InputGate}} between channels. Optional buffers can be floating or exclusive, 
InputGate/ResultPartition will request required buffers (1 per channel) and 
optional (1 per channel + floating).
 
I personally do not like the spilling approach. If we had to implement 
something, I would be voting more toward "assign required buffers immediately, 
and recommended only after a downstream consumers are guaranteed to make a 
progress". 

For a quick fix, we might want to configure {{InputGate}} for 
{{BoundedBlockingSubpartition}} to request always obligatory "1 exclusive 
buffer per channel + couple of floating", without any "optional" buffers. 
Probably we could go away without any floating buffers, as performance will be 
bottlenecked by reading from files on the sender side.
 
There is one more dimension here. There is a semi known issue, that Flink 
allocates/requires way too many buffers in the network stack, which causes 
problems with checkpointing under backpressure and with general memory 
requirements. Currently we try requesting 2 exclusive per channel + 8 floating 
both on the input and output. I'm pretty sure we could cut it down to:
* on the input: 0 exclusive per channel + 10 (8? 20? 40?) floating
* on the output: 1 exclusive per channel + 10 (8? 20?) floating  
Without negative performance effects (10 floating buffers should be enough to 
saturate 1gbps network with 1 ms message round trip).

Maybe after cutting the exclusive buffers from 4 per channel (current input + 
output) down to 1, we could make all of them obligatory, which could solve dead 
lock issues? But this would require some performance testing.

> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>
>                 Key: FLINK-14872
>                 URL: https://issues.apache.org/jira/browse/FLINK-14872
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yingjie Cao
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> Currently, the buffer pool size of InputGate reading from blocking 
> ResultPartition is unbounded which have a potential of using too many buffers 
> and may lead to ResultPartition of the same task can not acquire enough core 
> buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate 
> consumes lots of Buffer (not including the buffer reserved for 
> ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and 
> trigger redistribute of Buffers (Buffers taken by previous InputGate can not 
> be released) -> The first task of which InputGate uses lots of buffers begin 
> to emit records but can not acquire enough core Buffers (Some operators may 
> not emit records out immediately or there is just nothing to emit) -> 
> Deadlock.
>  
> I think we can fix this problem by limit the number of Buffers can be 
> allocated by a InputGate which reads from blocking ResultPartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to