[
https://issues.apache.org/jira/browse/FLINK-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559809#comment-16559809
]
Piotr Nowojski commented on FLINK-8523:
---------------------------------------
I think we shouldn't spill at all. Whenever sender serializes a checkpoint
barrier, it could either stop requesting for floating buffers, or return them
back to the receiver. Receiver could on the other hand not assign floating
buffers for channels blocked on alignment. Data that follow after checkpoint
barrier on the sender, would be just sitting in sender buffers waiting for more
credits.
I know that this seems like natural expansion towards "as it was before", but
spilling to disk in a network stack seems like a something very bad to me and I
haven't seen/heard about similar solution somewhere else.
Besides being wrong conceptually to me, it would/could seriously affect
performance, especially if IO is already saturated by RocksDB or checkpointing.
In such situations even writing/reading a single byte can block system for very
long times. Reading bytes from disk is just slower and doesn't scale compared
to sending them over the network. To make matters worse, those spilled buffers
would be tiny. Combined with multiple channels and multiple tasks we would be
asking for quite a lot of small random reads/writes. Thus compared to old
solution where we were spilling tons of data from a handful of streams, new one
that spills less data could happen to not be significantly faster.
> Stop assigning floating buffers for blocked input channels in exactly-once
> mode
> -------------------------------------------------------------------------------
>
> Key: FLINK-8523
> URL: https://issues.apache.org/jira/browse/FLINK-8523
> Project: Flink
> Issue Type: Sub-task
> Components: Network
> Affects Versions: 1.5.0, 1.6.0
> Reporter: zhijiang
> Assignee: zhijiang
> Priority: Major
> Labels: pull-request-available
>
> In exactly-once mode, the input channel is set blocked state when reading
> barrier from it. And the blocked state will be released after barrier
> alignment or cancelled.
>
> In credit-based network flow control, we should avoid assigning floating
> buffers for blocked input channels because the buffers after barrier will not
> be processed by operator until alignment.
> To do so, we can fully make use of floating buffers and speed up barrier
> alignment in some extent.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)