[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Max Michels reassigned FLINK-1442: ---------------------------------- Assignee: Max Michels > Archived Execution Graph consumes too much memory > ------------------------------------------------- > > Key: FLINK-1442 > URL: https://issues.apache.org/jira/browse/FLINK-1442 > Project: Flink > Issue Type: Bug > Components: JobManager > Affects Versions: 0.9 > Reporter: Stephan Ewen > Assignee: Max Michels > > The JobManager archives the execution graphs, for analysis of jobs. The > graphs may consume a lot of memory. > Especially the execution edges in all2all connection patterns are extremely > many and add up in memory consumption. > The execution edges connect all parallel tasks. So for a all2all pattern > between n and m tasks, there are n*m edges. For parallelism of multiple 100 > tasks, this can easily reach 100k objects and more, each with a set of > metadata. > I propose the following to solve that: > 1. Clear all execution edges from the graph (majority of the memory > consumers) when it is given to the archiver. > 2. Have the map/list of the archived graphs behind a soft reference, to it > will be removed under memory pressure before the JVM crashes. That may remove > graphs from the history early, but is much preferable to the JVM crashing, in > which case the graph is lost as well... > 3. Long term: The graph should be archived somewhere else. Somthing like the > History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)