Yun Gao created FLINK-12171:
-------------------------------

             Summary: The network buffer memory size should not be checked 
against the heap size on the TM side
                 Key: FLINK-12171
                 URL: https://issues.apache.org/jira/browse/FLINK-12171
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Configuration, Runtime / Network
    Affects Versions: 1.8.0, 1.7.2
         Environment: I tested with Flink-1.7.2 with computed network buffer 
size = 5G and taskmanager.heap.mb=6114, and the exception about checking is 
triggered. Yarn Session mode, Yarn single job mode and standalone mode are all 
tested.

 

I haven't tested on Flink-1.8 yet, but the logic seems to be not changed to me 
after reading the corresponding source code. 
            Reporter: Yun Gao


Currently when computing the network buffer memory size on the TM side in 
_TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or 
_NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), the 
computed network buffer memory size is checked to be less than 
`maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the maximum 
heap memory (namely -Xmx) .

 

With the above process, when TM starts, -Xmx is computed in RM or in 
_taskmanager.sh_ with (container memory - network buffer memory - managed 
memory),  thus the above checking implies that the heap memory of the TM must 
be larger than the network memory, which seems to be not necessary.

 

 

Therefore, I think the network buffer memory size also need to be checked 
against the total memory instead of the heap memory on the TM  side:
 # Checks that networkBufFraction < 1.0.
 # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
 # Compare the network buffer memory with the total memory.

This checking is also consistent with the similar one done on the RM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to