Based on Kurt's scenario, if the cumulator allocates a big ByteBuf from 
ByteBufAllocator during expansion, it is easy to result in creating a new 
PoolChunk(16M) because of no consistent memory in current PoolChunks. And this 
will cause the total used direct memory beyond estimated.

For further explaination:1. Each PoolArena maintains a list of PoolChunks and 
the PoolChunk is grouped into different lists based on memory usages.2. Each 
PoolChunk contains a list of subpage(8K) which are constructed a complete 
balanced binary tree for allocating memory easily.3. When allocating a length 
memory from ByteBufAllocator, PoolArena will try to loop all the current 
internal PoolChunks to find the enough consistent memory. If not found , it 
will create a new chunk.
For example, if the memory usage for a chunk is 50%, that means there are 8M 
room available for this chunk. If the length of memory allocation is small, 
this chunk can satisfy in most cases.But if the length is big like 1M, the 
remainder 50% space may not satisfy because all the available subpages are not 
under the same parent node in the tree.
After the network improvement mentioned in Stephan's FLIP, the direct memory 
usage by netty PooledByteBuffer can be largely reduced and under controlled 
easily.
cheers,zhijiang
------------------------------------------------------------------发件人:Kurt 
Young <k...@apache.org>发送时间:2017年6月30日(星期五) 15:51收件人:dev 
<dev@flink.apache.org>; user <u...@flink.apache.org>主 题:An addition to Netty's 
memory footprint
Hi,
Ufuk had write up an excellent document about Netty's memory allocation [1] 
inside Flink, and i want to add one more note after running some large scale 
jobs.
The only inaccurate thing about [1] is how much memory will 
LengthFieldBasedFrameDecoder use. From our observations, it will cost at most 
4M for each physical connection. 
Why(tl;dr): the reason is ByteToMessageDecoder which is the base class of 
LengthFieldBasedFrameDecoder used a Cumulator to save the bytes for further 
decoding. The Cumulator will try to discard some read bytes to make room in the 
buffer when channelReadComplete is triggered. In most cases, 
channelReadComplete will only be triggered by AbstractNioByteChannel after 
which has read "maxMessagesPerRead" times. The default value for 
maxMessagesPerRead is 16. So in worst case, the Cumulator will write up to 1M 
(64K * 16) data. And due to the logic of ByteBuf's discardSomeReadBytes, the 
Cumulator will expand to 4M.
We add an option to tune the maxMessagesPerRead, set it to 2 and everything 
works fine. I know Stephan is working on network improvements, it will be a 
good choice to replace the whole netty pipeline with Flink's own 
implementation. But I think we will face some similar logics when implementing, 
careful about this.
BTW, should we open a jira to add this config?

[1] https://cwiki.apache.org/confluence/display/FLINK/Netty+memory+allocation

Reply via email to