[ 
https://issues.apache.org/jira/browse/FLINK-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559809#comment-16559809
 ] 

Piotr Nowojski commented on FLINK-8523:
---------------------------------------

I think we shouldn't spill at all. Whenever sender serializes a checkpoint 
barrier, it could either stop requesting for floating buffers, or return them 
back to the receiver. Receiver could on the other hand not assign floating 
buffers for channels blocked on alignment. Data that follow after checkpoint 
barrier on the sender, would be just sitting in sender buffers waiting for more 
credits.

I know that this seems like natural expansion towards "as it was before", but 
spilling to disk in a network stack seems like a something very bad to me and I 
haven't seen/heard about similar solution somewhere else.

Besides being wrong conceptually to me, it would/could seriously affect 
performance, especially if IO is already saturated by RocksDB or checkpointing. 
In such situations even writing/reading a single byte can block system for very 
long times. Reading bytes from disk is just slower and doesn't scale compared 
to sending them over the network. To make matters worse, those spilled buffers 
would be tiny. Combined with multiple channels and multiple tasks we would be 
asking for quite a lot of small random reads/writes. Thus compared to old 
solution where we were spilling tons of data from a handful of streams, new one 
that spills less data could happen to not be significantly faster.

> Stop assigning floating buffers for blocked input channels in exactly-once 
> mode
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-8523
>                 URL: https://issues.apache.org/jira/browse/FLINK-8523
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>              Labels: pull-request-available
>
> In exactly-once mode, the input channel is set blocked state when reading 
> barrier from it. And the blocked state will be released after barrier 
> alignment or cancelled.
>  
> In credit-based network flow control, we should avoid assigning floating 
> buffers for blocked input channels because the buffers after barrier will not 
> be processed by operator until alignment.
> To do so, we can fully make use of floating buffers and speed up barrier 
> alignment in some extent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to