[
https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979969#comment-16979969
]
Piotr Nowojski commented on FLINK-14872:
----------------------------------------
[~kevin.cyj] let me think about it a bit more.
Side note, please do not confuse floating buffers (credit based flow control)
and optional buffers. Floating buffers can float within for example
{{InputGate}} between channels. Optional buffers can be floating or exclusive,
InputGate/ResultPartition will request required buffers (1 per channel) and
optional (1 per channel + floating).
I personally do not like the spilling approach. If we had to implement
something, I would be voting more toward "assign required buffers immediately,
and recommended only after a downstream consumers are guaranteed to make a
progress".
For a quick fix, we might want to configure {{InputGate}} for
{{BoundedBlockingSubpartition}} to request always obligatory "1 exclusive
buffer per channel + couple of floating", without any "optional" buffers.
Probably we could go away without any floating buffers, as performance will be
bottlenecked by reading from files on the sender side.
There is one more dimension here. There is a semi known issue, that Flink
allocates/requires way too many buffers in the network stack, which causes
problems with checkpointing under backpressure and with general memory
requirements. Currently we try requesting 2 exclusive per channel + 8 floating
both on the input and output. I'm pretty sure we could cut it down to:
* on the input: 0 exclusive per channel + 10 (8? 20? 40?) floating
* on the output: 1 exclusive per channel + 10 (8? 20?) floating
Without negative performance effects (10 floating buffers should be enough to
saturate 1gbps network with 1 ms message round trip).
Maybe after cutting the exclusive buffers from 4 per channel (current input +
output) down to 1, we could make all of them obligatory, which could solve dead
lock issues? But this would require some performance testing.
> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>
> Key: FLINK-14872
> URL: https://issues.apache.org/jira/browse/FLINK-14872
> Project: Flink
> Issue Type: Bug
> Reporter: Yingjie Cao
> Priority: Blocker
> Fix For: 1.10.0
>
>
> Currently, the buffer pool size of InputGate reading from blocking
> ResultPartition is unbounded which have a potential of using too many buffers
> and may lead to ResultPartition of the same task can not acquire enough core
> buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate
> consumes lots of Buffer (not including the buffer reserved for
> ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and
> trigger redistribute of Buffers (Buffers taken by previous InputGate can not
> be released) -> The first task of which InputGate uses lots of buffers begin
> to emit records but can not acquire enough core Buffers (Some operators may
> not emit records out immediately or there is just nothing to emit) ->
> Deadlock.
>
> I think we can fix this problem by limit the number of Buffers can be
> allocated by a InputGate which reads from blocking ResultPartition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)