[ 
https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849304#comment-16849304
 ] 

Yun Gao commented on FLINK-12171:
---------------------------------

After further analyze this problem, now I think we do not need to check the 
maximum allowed memory on TM side.

For RM side, we compute the network memory size from the total memory size, 
there may be cases that the configured MIN and MAX is too large that the 
resulted network memory is larger than the total memory size, we need to check 
against that.

However, on TM side, we do not know the total memory size, instead we only know 
the heap size. We can only deduce the total memory size by heap size + computed 
network memory, which is always larger than the computed network memory. 

Therefore, unless we ensure the total memory size is available on the TM side 
and we also compute the network memory size from the total memory size on TM 
side, we can not check the network memory size.

According to the above analysis, I think we can first remove the comparison of 
the network memory size and heap memory size directly. This comparison is not 
right since the network memory is not part of the heap memory, and it may raise 
error when the configuration is in fact reasonable. 

 

 

> The network buffer memory size should not be checked against the heap size on 
> the TM side
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-12171
>                 URL: https://issues.apache.org/jira/browse/FLINK-12171
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.7.2, 1.8.0
>         Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the 
> logic here.
>  
>            Reporter: Yun Gao
>            Assignee: Yun Gao
>            Priority: Major
>
> Currently when computing the network buffer memory size on the TM side in 
> _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or 
> _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), 
> the computed network buffer memory size is checked to be less than 
> `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the 
> maximum heap memory (namely -Xmx) .
>  
> With the above process, when TM starts, -Xmx is computed in RM or in 
> _taskmanager.sh_ with (container memory - network buffer memory - managed 
> memory),  thus the above checking implies that the heap memory of the TM must 
> be larger than the network memory, which seems to be not necessary.
>  
>  
> Therefore, I think the network buffer memory size also need to be checked 
> against the total memory instead of the heap memory on the TM  side:
>  # Checks that networkBufFraction < 1.0.
>  # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
>  # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to