zhijiangW commented on issue #10180: [FLINK-14631] Account for netty direct 
allocations in direct memory limit (Netty Shuffle)
URL: https://github.com/apache/flink/pull/10180#issuecomment-554995453
 
 
   Thanks for the reply and confirmation @azagrebin !
   
   I guess there are two issues above:
   
   1. What is the proper default value of number of arenas?
   
   It is a bit hard to give a proper default value here. In previous default 
setting, both the number of arenas/netty threads are referring the number of 
slots, because it is reasonable to keep the same between numThreads and 
numArenas to avoid thread contention.
   
   In netty internal implementation, the number of arenas is calculated by 
`maxDirectMemory/16MB/6`, because in general one arena might keep six chunks 
for different usage ratio.
   
   In a light-weight case I mean data are not accumulated in input/output 
queues and there are not many netty threads working concurrently, then the 
current value seems make sense as you mentioned in testing scenarios. The 
increased netty overhead is almost happening the large scale job with back 
pressure case.
   
   In the past, the total container memory was not considered the number of 
arenas. So even though we set it as the number of slots, it would not increase 
the processor resource. But now it might waste cluster resources if that amount 
arenas are not used in practice. If we are not caring much about the netty 
memory overhead, I prefer to make one arena with 6 chunks by default, not only 
for one chunk now.
   
   2. How to give the config for outside users?
   
   There are two config options now, one is for shuffle memory which is used by 
Flink `NetworkBufferPool` and the other is for netty usages via 
`numberOfArenas`. We plus them together as one factor of container's max direct 
memory resource. My previous concern was that the config of `numberOfArenas` is 
too expert for users and we can not give a proper explanation to guide users 
how to adjust it accordingly. But we have the specific formula for calculating 
the `NetworkBufferPool` side. I think there might have three options: 
   
   - Remove the `numberOfArenas` config for outside users, the framework can 
deduct this portion automatically from the config of shuffle memory size. But 
it might need to adjust the descriptions of current network memory setting.
   
   - Keep both config options, but adjust the `numberOfArenas` as 
`nettyMemorySize`, because the size seems more easy understood by users instead 
of arena concept. But it might seem a bit strange to have two setting for 
shuffle service itself.
   
   - Keep both config options as now in this PR, still uses the 
`numberOfArenas`, and it is based on the consumption that this config is rarely 
touched by outside users in most cases.
   
   You can choose one way as you prefer finally. :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to