Bhumika Bayani created FLINK-8622:
-------------------------------------
Summary: flink-mesos: High memory usage of scheduler + job
manager. GC never kicks in.
Key: FLINK-8622
URL: https://issues.apache.org/jira/browse/FLINK-8622
Project: Flink
Issue Type: Bug
Affects Versions: 1.3.2, 1.4.0
Reporter: Bhumika Bayani
We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.
We have observed that the memory usage for 'jobmanager' is high. In spite of
allocating more and more memory resources to it, it hits the limit within
minutes.
We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 GB
RAM, 3 GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM and
lesser heap (i.e. same, 3GB) too. In that case also, memory graph was identical.
As per the graph below, the scheduler almost always runs with maximum memory
resources.
!flink-mem-usage-graph-for-jira.png!
Throughout the run of the scheduler, we do not see memory usage going down
unless it is killed due to OOM. So inferring, garbage collection is never
happening.
We have tried using both flink versions 1.4 and 1.3 but could see same issue on
both versions.
Is there any way we can find out where and how memory is being used?
Are there any flink config options for jobmanager or jvm parameters which can
help us restrict the memory usage, force garbage collection, and prevent it
from crash?
Please let us know if there any resource recommendations from Flink for running
Flink on mesos at scale.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)