[ 
https://issues.apache.org/jira/browse/FLINK-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhumika Bayani updated FLINK-8622:
----------------------------------
    Attachment: flink-mem-usage-graph-for-jira.png

> flink-mesos: High memory usage of scheduler + job manager. GC never kicks in.
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-8622
>                 URL: https://issues.apache.org/jira/browse/FLINK-8622
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, Mesos, ResourceManager
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Bhumika Bayani
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: flink-mem-usage-graph-for-jira.png
>
>
> We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.
> We have observed that the memory usage for 'jobmanager' is high. In spite of 
> allocating more and more memory resources to it, it hits the limit within 
> minutes.
> We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 
> GB RAM, 3 GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM 
> and lesser heap (i.e. same, 3GB) too. In that case also, memory graph was 
> identical.
> As per the graph below, the scheduler almost always runs with maximum memory 
> resources.
> !flink-mem-usage-graph-for-jira.png!
>  
> Throughout the run of the scheduler, we do not see memory usage going down 
> unless it is killed due to OOM. So inferring, garbage collection is never 
> happening.
> We have tried using both flink versions 1.4 and 1.3 but could see same issue 
> on both versions.
>  
> Is there any way we can find out where and how memory is being used? 
> Are there any flink config options for jobmanager or jvm parameters which can 
> help us restrict the memory usage, force garbage collection, and prevent it 
> from crash? 
> Please let us know if there any resource recommendations from Flink for 
> running Flink on mesos at scale.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to