[ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474531#comment-15474531
 ] 

Gabor Gevay commented on FLINK-3322:
------------------------------------

I had an offline chat with [~StephanEwen], and an alternative design came up, 
which we should also consider:

There is the {{ResettableDriver}} interface, which is implemented by those 
drivers that need to retain some state between iteration steps. The way 
{{AbstractIterativeTask}} uses this interface, is that it checks if the driver 
is an instance of this interface, and then doesn't destroy and recreate the 
driver between iteration steps, but instead just calls {{reset}} on it. We 
could make every driver implement this interface, and in their {{reset}} method 
they would just hold on to the memory that they already have.

When this is done, it would also allow some simplification by eliminating the 
special case handling that is distinguishing between resettable and 
non-resettable operators in lots of different places.

Since touching every driver probably can't be avoided anyway, this looks like 
the cleanest solution.

> MemoryManager creates too much GC pressure with iterative jobs
> --------------------------------------------------------------
>
>                 Key: FLINK-3322
>                 URL: https://issues.apache.org/jira/browse/FLINK-3322
>             Project: Flink
>          Issue Type: Bug
>          Components: Local Runtime
>    Affects Versions: 1.0.0
>            Reporter: Gabor Gevay
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: FLINK-3322.docx
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to