Hi Luke,
       Yes, it is really not very friendly for users to set the number of 
network buffers. And it may cause OOM exception if not set the proper amount 
for large scale job.
The proper amount should be calculated automatically by framework based on the 
number of input and output channels for each task, and @Nico Kruber is working 
on 
this feature now. You can check this jira for some details 
"https://issues.apache.org/jira/browse/FLINK-4545";.
Cheers
Zhijiang
------------------------------------------------------------------发件人:Luke 
Hutchison (JIRA) <j...@apache.org>发送时间:2017年3月15日(星期三) 17:01收件人:dev 
<dev@flink.apache.org>主 题:[jira] [Created] (FLINK-6057) Better default needed 
for num network buffers
Luke Hutchison created FLINK-6057:
-------------------------------------

             Summary: Better default needed for num network buffers
                 Key: FLINK-6057
                 URL: https://issues.apache.org/jira/browse/FLINK-6057
             Project: Flink
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.2.0
            Reporter: Luke Hutchison


Using the default environment,

{code}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
{code}

my code will sometimes fail with an error that Flink ran out of network 
buffers. To fix this, I have to do:

{code}
int numTasks = Runtime.getRuntime().availableProcessors();
config.setInteger(ConfigConstants.DEFAULT_PARALLELISM_KEY, numTasks);
config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numTasks);
config.setInteger(ConfigConstants.TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY, 
numTasks * 2048);
{code}

The default value of 2048 fails when I increase the degree of parallelism for a 
large Flink pipeline (hence the fix to set the number of buffers to numTasks * 
2048).

This is particularly problematic because a pipeline can work fine on one 
machine, and when you start the pipeline on a machine with more cores, it can 
fail.

The default execution environment should pick a saner default based on the 
level of parallelism (or whatever is needed to ensure that the number of 
network buffers is not going to be exceeded for a given execution environment).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to