I like this approach and would suggest to make the ratio configurable. The default could be 50/50 or 60/40 (op heap / net heap)
On Mon, Feb 2, 2015 at 6:45 PM, Ufuk Celebi <u.cel...@fu-berlin.de> wrote: > Currently, the memory configuration of a task manager encompasses two > things: > > 1) NETWORK buffers: Fixed amount of memory for the network buffer pool > (default: 2048 buffers, each 32 KB => 64 MB) > > 2) OPERATOR buffers: A configurable fraction of the available heap memory > (default 0.7) for the memory manager used by internal operators like sort > or hash > > With the recently added supported for intermediate results, intermediate > results live in the network stack and use buffers from the network buffer > pool. Currently, our memory management is not a problem, because we only > support ephemeral intermediate results, which are directly consumed in a > pipelined fashion (i.e. the buffer pools are short-lived). > > But with the upcoming support for persistent intermediate results ( > https://github.com/apache/flink/pull/356) and fine-grained fault > tolerance, we need to rethink how we configure/divide the available memory > between the network stack and the operators as the network buffer pools > will live longer. > > I would suggest the following: > > 1) Remove the configuration for network buffers > > 2) Keep the fraction configuration, but internally divide it between > network stack and operators in a 50:50 fashion (in the future with dynamic > memory management, we will not even need to statically divide the memory > between network stack and operators). > > > What do you think? Is this reasonable? Should the division be configurable > as well? > > – Ufuk