[ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632781#comment-15632781
 ] 

Greg Hogan commented on FLINK-4545:
-----------------------------------

In 1.2 we now expose metrics for the number of allocated and in use network 
buffers. This is now in the web UI but wasn't properly populating the values.

Is there an exact "best" number of network buffers to allocate? Flink may run 
with a minimal allocation but this may cause additional backpressure while 
waiting for buffers to be processed and become free. We recommend something 
similar by advising users to maximize the TaskManager memory allocation, which 
may be better for streaming than batch since after spilling to disk the 
merge-sort makes use of the filesystem's read-ahead cache.

+1 to dynamic scaling, and also if/when memory buffers are dynamically 
allocated between operators.

> Flink automatically manages TM network buffer
> ---------------------------------------------
>
>                 Key: FLINK-4545
>                 URL: https://issues.apache.org/jira/browse/FLINK-4545
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Zhenzhong Xu
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to