[
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504633#comment-15504633
]
ASF GitHub Bot commented on FLINK-3322:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2495
Is there a section that describes the design that this follows in more
detail? I would like to take a look and comment.
I am a bit skeptical whether this is going in the right direction. For
example, I see no reason why there should be a `SorterMemoryAllocator`, or any
specialization for sorters. Ideally, we get also fewer specializations for
iterative tasks, not more.
This looks like many specializations and case distinctions. Flink runs in
critical production settings and this memory allocation stuff is super
critical, so we can really only merge it when we feel super comfortable that
this is (1) rock solid and (2) a good design for the future. Otherwise this
will be a lot of code.
To achieve that, I think it would help to break this down into finer issues
and address them one at a time. I can understand that it is hard to slow down
sometimes, but for critical runtime changes like this one, I think you need to
adjust to the speed of whoever can really review and merge these fine grained
changes.
> MemoryManager creates too much GC pressure with iterative jobs
> --------------------------------------------------------------
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
> Issue Type: Bug
> Components: Local Runtime
> Affects Versions: 1.0.0
> Reporter: Gabor Gevay
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 1.0.0
>
> Attachments: FLINK-3322.docx, FLINK-3322_reusingmemoryfordrivers.docx
>
>
> When taskmanager.memory.preallocate is false (the default), released memory
> segments are not added to a pool, but the GC is expected to take care of
> them. This puts too much pressure on the GC with iterative jobs, where the
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times.
> (It will generate some lookuptables to /tmp on first run for a few minutes.)
> (I think the slowdown might also depend somewhat on
> taskmanager.memory.fraction, because more unused non-managed memory results
> in rarer GCs.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)