[
https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yun Gao updated FLINK-12171:
----------------------------
Affects Version/s: 1.9.0
> The network buffer memory size should not be checked against the heap size on
> the TM side
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-12171
> URL: https://issues.apache.org/jira/browse/FLINK-12171
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.7.2, 1.8.0, 1.9.0
> Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the
> logic here.
>
> Reporter: Yun Gao
> Assignee: Yun Gao
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently when computing the network buffer memory size on the TM side in
> _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or
> _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master),
> the computed network buffer memory size is checked to be less than
> `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the
> maximum heap memory (namely -Xmx) .
>
> With the above process, when TM starts, -Xmx is computed in RM or in
> _taskmanager.sh_ with (container memory - network buffer memory - managed
> memory), thus the above checking implies that the heap memory of the TM must
> be larger than the network memory, which seems to be not necessary.
>
> This may cause TM to use more memory than expected. For example, for a job
> who has a large network throughput, uses may configure network memory to 2G.
> However, if users want to assign 1G to heap memory, the TM will fail to
> start, and user has to allocate at least 2G heap memory (in other words, 4G
> in total for the TM instead of 3G) to make the TM runnable. This may cause
> resource inefficiency.
>
> Therefore, I think the network buffer memory size also need to be checked
> against the total memory instead of the heap memory on the TM side:
> # Checks that networkBufFraction < 1.0.
> # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
> # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)