Zhenzhong Xu created FLINK-4545:
-----------------------------------

             Summary: Flink automatically manages TM network buffer
                 Key: FLINK-4545
                 URL: https://issues.apache.org/jira/browse/FLINK-4545
             Project: Flink
          Issue Type: Wish
            Reporter: Zhenzhong Xu



Currently, the number of network buffer per task manager is preconfigured and 
the memory is pre-allocated through taskmanager.network.numberOfBuffers config. 
In a Job DAG with shuffle phase, this number can go up very high depends on the 
TM cluster size. The formula for calculating the buffer count is documented 
here 
(https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
  

#slots-per-TM^2 * #TMs * 4

In a standalone deployment, we may need to control the task manager cluster 
size dynamically and then leverage the up-coming Flink feature to support 
scaling job parallelism/rescaling at runtime. 
If the buffer count config is static at runtime and cannot be changed without 
restarting task manager process, this may add latency and complexity for 
scaling process. I am wondering if there is already any discussion around 
whether the network buffer should be automatically managed by Flink or at least 
expose some API to allow it to be reconfigured. Let me know if there is any 
existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to