@Stephan
Not sure what do you mean by "just having this value". Are you suggesting
that having "taskmanager.memory.network" refers to "shuffle" and "other
network memory", or only "shuffle"?

I guess what you mean is only "shuffle"? Because currently
"taskmanager.network.memory" refers to shuffle buffers only, which is "one
less config value to break".

Thank you~

Xintong Song



On Wed, Sep 4, 2019 at 3:42 PM Stephan Ewen <se...@apache.org> wrote:

> If we later split the network memory into "shuffle" and "other network
> memory", I think it would make sense to split the option then.
>
> In that case "taskmanager.memory.network" would probably refer to the total
> network memory, which would most likely be what most users actually
> configure.
> My feeling is that for now just having this value is actually easier, and
> it is one less config value to break (which is also good).
>
> On Wed, Sep 4, 2019 at 9:05 AM Xintong Song <tonysong...@gmail.com> wrote:
>
> > Thanks for the voting and comments.
> >
> > @Stephan
> > - The '-XX:MaxDirectMemorySize' value should not include JVM Overhead.
> > Thanks for correction.
> > - 'taskmanager.memory.framework.heap' it heap memory reserved for task
> > executor framework, which can not be allocated to task slots. I think
> users
> > should be able to configure both how many total java heap memory a task
> > executor should have and how many of the total java heap memory can be
> > allocated to task slots.
> > - Regarding network / shuffle memory, I'm with @Andrey. ATM, this memory
> > pool (derived from "taskmanager.network.memory.[min/max/fraction]") is
> only
> > used inside NettyShuffleEnvironment as network buffers. There might be
> > other network memory usage outside the shuffle component (as @Zhijiang
> also
> > suggested), but that is not accounted by this memory pool. This is
> exactly
> > why I would suggest to change the name from 'network' to 'shuffle'.
> > - I agree that we need very good documentation to explain the memory
> pools
> > and config options, as well as WebUI to present the memory pool sizes. I
> > would suggest to address these as follow-ups of all the three resource
> > management FLIPs, for better integration.
> >
> > @Andrey
> > - Agree with the 'shuffle' naming. See above.
> >
> > @Till
> > - My understanding is that Task Off-heap memory accounts for both direct
> > and native memory used by the user code. I'm not sure whether we need
> > another configure option to split it. Given that we only decided (in the
> > discussion thread) to try it out the way we set '-XX:MaxDirectMemorySize'
> > in current design and may switch to other alternatives if it doesn't work
> > out well, I would suggest the same for this one. I suggest that we first
> > try it without the splitting config option, and we can easily add it if
> it
> > doesn't work well.
> > - Agree that it's really important to have good documentation for this.
> See
> > above.
> >
> > @Zhijiang
> > - Thanks for the input. My understanding is that 'shuffle memory' is a
> > portion of the task executor memory reserved for the shuffle component.
> The
> > way shuffle component use these memory (local buffer pool, netty internal
> > memory, etc.) can be different depending on the shuffle implementation.
> The
> > task executor (outside of the shuffle implementation) should only know
> the
> > overall memory usage of the shuffle component but no need to understand
> > more details inside the shuffle implementation.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Sep 3, 2019 at 10:41 PM zhijiang <wangzhijiang...@aliyun.com
> > .invalid>
> > wrote:
> >
> > > Thanks for proposing this FLIP and also +1 on my side.
> > >
> > > @Andrey Zagrebin For the point of "network memory is actually used more
> > > than shuffling", I guess that the component of queryable state is also
> > > using network/netty stack atm, which is outside of shuffling.
> > > In addition, if we only consider the shuffle memory provided by shuffle
> > > service interface, we should not only consider the memory used by local
> > > buffer pool, but also consider the netty internal memory
> > > usages as the overhead, especially we have not the zero-copy
> improvement
> > > on dowstream read side. This issue might be out of the vote scope, just
> > > think of we have this issue in [1]. :)
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-12110
> > >
> > > Best,
> > > Zhijiang
> > > ------------------------------------------------------------------
> > > From:Till Rohrmann <trohrm...@apache.org>
> > > Send Time:2019年9月3日(星期二) 15:07
> > > To:dev <dev@flink.apache.org>
> > > Subject:Re: [VOTE] FLIP-49: Unified Memory Configuration for
> > TaskExecutors
> > >
> > > Thanks for creating this FLIP and starting the vote Xintong.
> > >
> > > +1 for the proposal from my side.
> > >
> > > I agree with Stephan that we might wanna revisit some of the
> > configuration
> > > names.
> > >
> > > If I understood it correctly, then Task Off-heap memory represents the
> > > direct memory used by the user code, right? How would users configure
> > > native memory requirements for the user code? If it is part of Task Off
> > > heap memory, then we need to split it to set -XX:MaxDirectMemorySize
> > > correctly or to introduce another configuration option.
> > >
> > > Given all these configuration options, I can see that users will get
> > > confused quite easily. Therefore, I would like to emphasise that we
> need
> > a
> > > very good documentation about how to properly configure Flink processes
> > and
> > > which knobs to turn in which cases.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Sep 3, 2019 at 2:34 PM Andrey Zagrebin <and...@ververica.com>
> > > wrote:
> > >
> > > > Thanks for starting the vote Xintong
> > > >
> > > > Also +1 for the proposed FLIP-49.
> > > >
> > > > @Stephan regarding namings: network vs shuffle.
> > > > My understanding so far was that the network memory is what we
> > basically
> > > > give to Shuffle implementations and default netty implementation uses
> > it
> > > in
> > > > particular mostly for networking.
> > > > Are the network pools used for something else outside of the
> shuffling
> > > > scope?
> > > >
> > > > best,
> > > > Andrey
> > > >
> > > > On Tue, Sep 3, 2019 at 1:01 PM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > +1 to the proposal in general
> > > > >
> > > > > A few things seems to be a bit put of sync with the latest
> > discussions
> > > > > though.
> > > > >
> > > > > The section about JVM Parameters states that the
> > > > > -XX:MaxDirectMemorySize value is set to Task Off-heap Memory,
> Shuffle
> > > > > Memory and JVM Overhead.
> > > > > The way I understand the last discussion conclusion is that it is
> > only
> > > > the
> > > > > sum of shuffle memory and user-defined direct memory.
> > > > >
> > > > > I am someone neutral but unsure about is the separation between
> > > > > "taskmanager.memory.framework.heap" and
> > "taskmanager.memory.task.heap".
> > > > > Could that be simply combined under "taskmanager.memory.javaheap"?
> > > > >
> > > > > It might be good to also expose these values somehow in the web UI
> so
> > > > that
> > > > > users see immediately what amount of memory TMs assume to use for
> > what.
> > > > >
> > > > > I assume config key names and default values might be adjusted over
> > > time
> > > > as
> > > > > we get feedback.
> > > > >   - I would keep the network memory under the name
> > > > > "taskmanager.memory.network". Because network memory is actually
> used
> > > for
> > > > > more than shuffling. Also, the old config key seems good, so why
> > change
> > > > it?
> > > > >
> > > > > One thing to be aware of is that often, the Java Heap is understood
> > as
> > > > > "managed memory" as a whole, because it is managed by the GC not
> > > > explicitly
> > > > > by the user.
> > > > > So we need to make sure that we don't confuse users by speaking of
> > > > managed
> > > > > heap and unmanaged heap. All heap is managed in Java. Some memory
> is
> > > > > explicitly managed by Flink.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Mon, Sep 2, 2019 at 3:08 PM Xintong Song <tonysong...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'm here to re-start the voting process for FLIP-49 [1], with
> > respect
> > > > to
> > > > > > consensus reached in this thread [2] regarding some new comments
> > and
> > > > > > concerns.
> > > > > >
> > > > > > This voting will be open for at least 72 hours. I'll try to close
> > it
> > > > Sep.
> > > > > > 5, 14:00 UTC, unless there is an objection or not enough votes.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-49-Unified-Memory-Configuration-for-TaskExecutors-td31436.html
> > > > > >
> > > > > > On Tue, Aug 27, 2019 at 9:29 PM Xintong Song <
> > tonysong...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Alright, then let's keep the discussion in the DISCUSS mailing
> > > > thread,
> > > > > > and
> > > > > > > see whether we need to restart the vote.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann <
> > > trohrm...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I had a couple of comments concerning the implementation plan.
> > > I've
> > > > > > posted
> > > > > > >> them to the original discussion thread. Depending on the
> outcome
> > > of
> > > > > this
> > > > > > >> discussion we might need to restart the vote.
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Till
> > > > > > >>
> > > > > > >> On Tue, Aug 27, 2019 at 11:14 AM Xintong Song <
> > > > tonysong...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi all,
> > > > > > >> >
> > > > > > >> > I would like to start the voting process for FLIP-49 [1],
> > which
> > > is
> > > > > > >> > discussed and reached consensus in this thread [2].
> > > > > > >> >
> > > > > > >> > This voting will be open for at least 72 hours. I'll try to
> > > close
> > > > it
> > > > > > >> Aug.
> > > > > > >> > 30 10:00 UTC, unless there is an objection or not enough
> > votes.
> > > > > > >> >
> > > > > > >> > Thank you~
> > > > > > >> >
> > > > > > >> > Xintong Song
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > [1]
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > >> > [2]
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-49-Unified-Memory-Configuration-for-TaskExecutors-td31436.html
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Reply via email to