Hi, I have Flink running on 2 docker images, one for the job manager, and one for the task manager, with the configuration below.
64GB RAM machine 200 GB SSD used only by RocksDB Flink's memory configuration file is like that: jobmanager.heap.mb: 3072 taskmanager.heap.mb: 53248 taskmanager.memory.fraction: 0.7 I have a very large and heavy job running in this server. The problem is that the task manager is trying to take more memory than defined on the configuration and eventually crashes the server, although the heap never reaches the maximum memory. The last memory log before crashing shows: Memory usage stats: [HEAP: 44432/53248/53248 MB, NON HEAP: 157/160/-1 MB (used/committed/max)] But the memory used by the task manager container is near 64GB I have some doubts regarding memory usage of Flink. 1. Shouldn't the sum of the job manager memory and the task manager memory account for all the memory allocated by Flink? Am I missing any configuration? 2. How can I mantain the server working in this scenario? 3. I thought that RocksDB would do the job, but it didn't happen. 4. In the past, I have seen Flink taking a checkpoint of 3GB, but allocating initially 35GB of RAM. Where does this extra memory come from? Can anyone help me, please? Thanks in advance. Pedro Luis
