zhijiangW commented on issue #10180: [FLINK-14631] Account for netty direct allocations in direct memory limit (Netty Shuffle) URL: https://github.com/apache/flink/pull/10180#issuecomment-554995453 Thanks for the reply and confirmation @azagrebin ! I guess there are two issues above: 1. What is the proper default value of number of arenas? It is a bit hard to give a proper default value here. In previous default setting, both the number of arenas/netty threads are referring the number of slots, because it is reasonable to keep the same between numThreads and numArenas to avoid thread contention. In netty internal implementation, the number of arenas is calculated by `maxDirectMemory/16MB/6`, because in general one arena might keep six chunks for different usage ratio. In a light-weight case I mean data are not accumulated in input/output queues and there are not many netty threads working concurrently, then the current value seems make sense as you mentioned in testing scenarios. The increased netty overhead is almost happening the large scale job with back pressure case. In the past, the total container memory was not considered the number of arenas. So even though we set it as the number of slots, it would not increase the processor resource. But now it might waste cluster resources if that amount arenas are not used in practice. If we are not caring much about the netty memory overhead, I prefer to make one arena with 6 chunks by default, not only for one chunk now. 2. How to give the config for outside users? There are two config options now, one is for shuffle memory which is used by Flink `NetworkBufferPool` and the other is for netty usages via `numberOfArenas`. We plus them together as one factor of container's max direct memory resource. My previous concern was that the config of `numberOfArenas` is too expert for users and we can not give a proper explanation to guide users how to adjust it accordingly. But we have the specific formula for calculating the `NetworkBufferPool` side. I think there might have three options: - Remove the `numberOfArenas` config for outside users, the framework can deduct this portion automatically from the config of shuffle memory size. But it might need to adjust the descriptions of current network memory setting. - Keep both config options, but adjust the `numberOfArenas` as `nettyMemorySize`, because the size seems more easy understood by users instead of arena concept. But it might seem a bit strange to have two setting for shuffle service itself. - Keep both config options as now in this PR, still uses the `numberOfArenas`, and it is based on the consumption that this config is rarely touched by outside users in most cases. You can choose one way as you prefer finally. :)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
