Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Till Rohrmann Fri, 16 Aug 2019 04:48:35 -0700

Thanks for the clarification Xintong. I understand the two alternatives now.


I would be in favour of option 2 because it makes things explicit. If we
don't limit the direct memory, I fear that we might end up in a similar
situation as we are currently in: The user might see that her process gets
killed by the OS and does not know why this is the case. Consequently, she
tries to decrease the process memory size (similar to increasing the cutoff
ratio) in order to accommodate for the extra direct memory. Even worse, she
tries to decrease memory budgets which are not fully used and hence won't
change the overall memory consumption.

Cheers,
Till

On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <tonysong...@gmail.com> wrote:

> Let me explain this with a concrete example Till.
>
> Let's say we have the following scenario.
>
> Total Process Memory: 1GB
> JVM Direct Memory (Task Off-Heap Memory + JVM Overhead): 200MB
> Other Memory (JVM Heap Memory, JVM Metaspace, Off-Heap Managed Memory and
> Network Memory): 800MB
>
>
> For alternative 2, we set -XX:MaxDirectMemorySize to 200MB.
> For alternative 3, we set -XX:MaxDirectMemorySize to a very large value,
> let's say 1TB.
>
> If the actual direct memory usage of Task Off-Heap Memory and JVM Overhead
> do not exceed 200MB, then alternative 2 and alternative 3 should have the
> same utility. Setting larger -XX:MaxDirectMemorySize will not reduce the
> sizes of the other memory pools.
>
> If the actual direct memory usage of Task Off-Heap Memory and JVM
> Overhead potentially exceed 200MB, then
>
>    - Alternative 2 suffers from frequent OOM. To avoid that, the only thing
>    user can do is to modify the configuration and increase JVM Direct
> Memory
>    (Task Off-Heap Memory + JVM Overhead). Let's say that user increases JVM
>    Direct Memory to 250MB, this will reduce the total size of other memory
>    pools to 750MB, given the total process memory remains 1GB.
>    - For alternative 3, there is no chance of direct OOM. There are chances
>    of exceeding the total process memory limit, but given that the process
> may
>    not use up all the reserved native memory (Off-Heap Managed Memory,
> Network
>    Memory, JVM Metaspace), if the actual direct memory usage is slightly
> above
>    yet very close to 200MB, user probably do not need to change the
>    configurations.
>
> Therefore, I think from the user's perspective, a feasible configuration
> for alternative 2 may lead to lower resource utilization compared to
> alternative 3.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > I guess you have to help me understand the difference between
> alternative 2
> > and 3 wrt to memory under utilization Xintong.
> >
> > - Alternative 2: set XX:MaxDirectMemorySize to Task Off-Heap Memory and
> JVM
> > Overhead. Then there is the risk that this size is too low resulting in a
> > lot of garbage collection and potentially an OOM.
> > - Alternative 3: set XX:MaxDirectMemorySize to something larger than
> > alternative 2. This would of course reduce the sizes of the other memory
> > types.
> >
> > How would alternative 2 now result in an under utilization of memory
> > compared to alternative 3? If alternative 3 strictly sets a higher max
> > direct memory size and we use only little, then I would expect that
> > alternative 3 results in memory under utilization.
> >
> > Cheers,
> > Till
> >
> > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <danrtsey...@gmail.com> wrote:
> >
> > > Hi xintong,till
> > >
> > >
> > > > Native and Direct Memory
> > >
> > > My point is setting a very large max direct memory size when we do not
> > > differentiate direct and native memory. If the direct memory,including
> > user
> > > direct memory and framework direct memory,could be calculated
> > > correctly,then
> > > i am in favor of setting direct memory with fixed value.
> > >
> > >
> > >
> > > > Memory Calculation
> > >
> > > I agree with xintong. For Yarn and k8s,we need to check the memory
> > > configurations in client to avoid submitting successfully and failing
> in
> > > the flink master.
> > >
> > >
> > > Best,
> > >
> > > Yang
> > >
> > > Xintong Song <tonysong...@gmail.com>于2019年8月13日 周二22:07写道：
> > >
> > > > Thanks for replying, Till.
> > > >
> > > > About MemorySegment, I think you are right that we should not include
> > > this
> > > > issue in the scope of this FLIP. This FLIP should concentrate on how
> to
> > > > configure memory pools for TaskExecutors, with minimum involvement on
> > how
> > > > memory consumers use it.
> > > >
> > > > About direct memory, I think alternative 3 may not having the same
> over
> > > > reservation issue that alternative 2 does, but at the cost of risk of
> > > over
> > > > using memory at the container level, which is not good. My point is
> > that
> > > > both "Task Off-Heap Memory" and "JVM Overhead" are not easy to
> config.
> > > For
> > > > alternative 2, users might configure them higher than what actually
> > > needed,
> > > > just to avoid getting a direct OOM. For alternative 3, users do not
> get
> > > > direct OOM, so they may not config the two options aggressively high.
> > But
> > > > the consequences are risks of overall container memory usage exceeds
> > the
> > > > budget.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <trohrm...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for proposing this FLIP Xintong.
> > > > >
> > > > > All in all I think it already looks quite good. Concerning the
> first
> > > open
> > > > > question about allocating memory segments, I was wondering whether
> > this
> > > > is
> > > > > strictly necessary to do in the context of this FLIP or whether
> this
> > > > could
> > > > > be done as a follow up? Without knowing all details, I would be
> > > concerned
> > > > > that we would widen the scope of this FLIP too much because we
> would
> > > have
> > > > > to touch all the existing call sites of the MemoryManager where we
> > > > allocate
> > > > > memory segments (this should mainly be batch operators). The
> addition
> > > of
> > > > > the memory reservation call to the MemoryManager should not be
> > affected
> > > > by
> > > > > this and I would hope that this is the only point of interaction a
> > > > > streaming job would have with the MemoryManager.
> > > > >
> > > > > Concerning the second open question about setting or not setting a
> > max
> > > > > direct memory limit, I would also be interested why Yang Wang
> thinks
> > > > > leaving it open would be best. My concern about this would be that
> we
> > > > would
> > > > > be in a similar situation as we are now with the
> RocksDBStateBackend.
> > > If
> > > > > the different memory pools are not clearly separated and can spill
> > over
> > > > to
> > > > > a different pool, then it is quite hard to understand what exactly
> > > > causes a
> > > > > process to get killed for using too much memory. This could then
> > easily
> > > > > lead to a similar situation what we have with the cutoff-ratio. So
> > why
> > > > not
> > > > > setting a sane default value for max direct memory and giving the
> > user
> > > an
> > > > > option to increase it if he runs into an OOM.
> > > > >
> > > > > @Xintong, how would alternative 2 lead to lower memory utilization
> > than
> > > > > alternative 3 where we set the direct memory to a higher value?
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <tonysong...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Thanks for the feedback, Yang.
> > > > > >
> > > > > > Regarding your comments:
> > > > > >
> > > > > > *Native and Direct Memory*
> > > > > > I think setting a very large max direct memory size definitely
> has
> > > some
> > > > > > good sides. E.g., we do not worry about direct OOM, and we don't
> > even
> > > > > need
> > > > > > to allocate managed / network memory with Unsafe.allocate() .
> > > > > > However, there are also some down sides of doing this.
> > > > > >
> > > > > >    - One thing I can think of is that if a task executor
> container
> > is
> > > > > >    killed due to overusing memory, it could be hard for use to
> know
> > > > which
> > > > > > part
> > > > > >    of the memory is overused.
> > > > > >    - Another down side is that the JVM never trigger GC due to
> > > reaching
> > > > > max
> > > > > >    direct memory limit, because the limit is too high to be
> > reached.
> > > > That
> > > > > >    means we kind of relay on heap memory to trigger GC and
> release
> > > > direct
> > > > > >    memory. That could be a problem in cases where we have more
> > direct
> > > > > > memory
> > > > > >    usage but not enough heap activity to trigger the GC.
> > > > > >
> > > > > > Maybe you can share your reasons for preferring setting a very
> > large
> > > > > value,
> > > > > > if there are anything else I overlooked.
> > > > > >
> > > > > > *Memory Calculation*
> > > > > > If there is any conflict between multiple configuration that user
> > > > > > explicitly specified, I think we should throw an error.
> > > > > > I think doing checking on the client side is a good idea, so that
> > on
> > > > > Yarn /
> > > > > > K8s we can discover the problem before submitting the Flink
> > cluster,
> > > > > which
> > > > > > is always a good thing.
> > > > > > But we can not only rely on the client side checking, because for
> > > > > > standalone cluster TaskManagers on different machines may have
> > > > different
> > > > > > configurations and the client does see that.
> > > > > > What do you think?
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <danrtsey...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi xintong,
> > > > > > >
> > > > > > >
> > > > > > > Thanks for your detailed proposal. After all the memory
> > > configuration
> > > > > are
> > > > > > > introduced, it will be more powerful to control the flink
> memory
> > > > > usage. I
> > > > > > > just have few questions about it.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >    - Native and Direct Memory
> > > > > > >
> > > > > > > We do not differentiate user direct memory and native memory.
> > They
> > > > are
> > > > > > all
> > > > > > > included in task off-heap memory. Right? So i don’t think we
> > could
> > > > not
> > > > > > set
> > > > > > > the -XX:MaxDirectMemorySize properly. I prefer leaving it a
> very
> > > > large
> > > > > > > value.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >    - Memory Calculation
> > > > > > >
> > > > > > > If the sum of and fine-grained memory(network memory, managed
> > > memory,
> > > > > > etc.)
> > > > > > > is larger than total process memory, how do we deal with this
> > > > > situation?
> > > > > > Do
> > > > > > > we need to check the memory configuration in client?
> > > > > > >
> > > > > > > Xintong Song <tonysong...@gmail.com> 于2019年8月7日周三 下午10:14写道：
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We would like to start a discussion thread on "FLIP-49:
> Unified
> > > > > Memory
> > > > > > > > Configuration for TaskExecutors"[1], where we describe how to
> > > > improve
> > > > > > > > TaskExecutor memory configurations. The FLIP document is
> mostly
> > > > based
> > > > > > on
> > > > > > > an
> > > > > > > > early design "Memory Management and Configuration
> Reloaded"[2]
> > by
> > > > > > > Stephan,
> > > > > > > > with updates from follow-up discussions both online and
> > offline.
> > > > > > > >
> > > > > > > > This FLIP addresses several shortcomings of current (Flink
> 1.9)
> > > > > > > > TaskExecutor memory configuration.
> > > > > > > >
> > > > > > > >    - Different configuration for Streaming and Batch.
> > > > > > > >    - Complex and difficult configuration of RocksDB in
> > Streaming.
> > > > > > > >    - Complicated, uncertain and hard to understand.
> > > > > > > >
> > > > > > > >
> > > > > > > > Key changes to solve the problems can be summarized as
> follows.
> > > > > > > >
> > > > > > > >    - Extend memory manager to also account for memory usage
> by
> > > > state
> > > > > > > >    backends.
> > > > > > > >    - Modify how TaskExecutor memory is partitioned accounted
> > > > > individual
> > > > > > > >    memory reservations and pools.
> > > > > > > >    - Simplify memory configuration options and calculations
> > > logics.
> > > > > > > >
> > > > > > > >
> > > > > > > > Please find more details in the FLIP wiki document [1].
> > > > > > > >
> > > > > > > > (Please note that the early design doc [2] is out of sync,
> and
> > it
> > > > is
> > > > > > > > appreciated to have the discussion in this mailing list
> > thread.)
> > > > > > > >
> > > > > > > >
> > > > > > > > Looking forward to your feedbacks.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to