Maybe to clear up confusion about my suggestion:

I would vote to keep the name of the config parameter
"taskmanager.memory.network" because it is the same key as currently (good
to not break things unless good reason) and there currently is no case or
even a concrete follow-up where we would actually differentiate between
different types of network memory.

I would suggest to not prematurely rename this because of something that
might happen in the future. Experience shows that its better to do these
things when the actual change comes.

Side note: I am not sure "shuffle" is a good term in this context. I have
so far only heard that in batch contexts, which is not what we do here. One
more reason for me to not pre-maturely change names.

On Wed, Sep 4, 2019 at 10:56 AM Xintong Song <tonysong...@gmail.com> wrote:

> @till
>
> > Just to clarify Xintong, you suggest that Task off-heap memory represents
> > direct and native memory. Since we don't know how the user will allocate
> > the memory we will add this value to -XX:MaxDirectMemorySize so that the
> > process won't fail if the user allocates only direct memory and no native
> > memory. Is that correct?
> >
> Yes, this is what I mean.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 4, 2019 at 4:25 PM Till Rohrmann <trohrm...@apache.org> wrote:
>
> > Just to clarify Xintong, you suggest that Task off-heap memory represents
> > direct and native memory. Since we don't know how the user will allocate
> > the memory we will add this value to -XX:MaxDirectMemorySize so that the
> > process won't fail if the user allocates only direct memory and no native
> > memory. Is that correct?
> >
> > Cheers,
> > Till
> >
> > On Wed, Sep 4, 2019 at 10:18 AM Xintong Song <tonysong...@gmail.com>
> > wrote:
> >
> > > @Stephan
> > > Not sure what do you mean by "just having this value". Are you
> suggesting
> > > that having "taskmanager.memory.network" refers to "shuffle" and "other
> > > network memory", or only "shuffle"?
> > >
> > > I guess what you mean is only "shuffle"? Because currently
> > > "taskmanager.network.memory" refers to shuffle buffers only, which is
> > "one
> > > less config value to break".
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Sep 4, 2019 at 3:42 PM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > If we later split the network memory into "shuffle" and "other
> network
> > > > memory", I think it would make sense to split the option then.
> > > >
> > > > In that case "taskmanager.memory.network" would probably refer to the
> > > total
> > > > network memory, which would most likely be what most users actually
> > > > configure.
> > > > My feeling is that for now just having this value is actually easier,
> > and
> > > > it is one less config value to break (which is also good).
> > > >
> > > > On Wed, Sep 4, 2019 at 9:05 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for the voting and comments.
> > > > >
> > > > > @Stephan
> > > > > - The '-XX:MaxDirectMemorySize' value should not include JVM
> > Overhead.
> > > > > Thanks for correction.
> > > > > - 'taskmanager.memory.framework.heap' it heap memory reserved for
> > task
> > > > > executor framework, which can not be allocated to task slots. I
> think
> > > > users
> > > > > should be able to configure both how many total java heap memory a
> > task
> > > > > executor should have and how many of the total java heap memory can
> > be
> > > > > allocated to task slots.
> > > > > - Regarding network / shuffle memory, I'm with @Andrey. ATM, this
> > > memory
> > > > > pool (derived from "taskmanager.network.memory.[min/max/fraction]")
> > is
> > > > only
> > > > > used inside NettyShuffleEnvironment as network buffers. There might
> > be
> > > > > other network memory usage outside the shuffle component (as
> > @Zhijiang
> > > > also
> > > > > suggested), but that is not accounted by this memory pool. This is
> > > > exactly
> > > > > why I would suggest to change the name from 'network' to 'shuffle'.
> > > > > - I agree that we need very good documentation to explain the
> memory
> > > > pools
> > > > > and config options, as well as WebUI to present the memory pool
> > sizes.
> > > I
> > > > > would suggest to address these as follow-ups of all the three
> > resource
> > > > > management FLIPs, for better integration.
> > > > >
> > > > > @Andrey
> > > > > - Agree with the 'shuffle' naming. See above.
> > > > >
> > > > > @Till
> > > > > - My understanding is that Task Off-heap memory accounts for both
> > > direct
> > > > > and native memory used by the user code. I'm not sure whether we
> need
> > > > > another configure option to split it. Given that we only decided
> (in
> > > the
> > > > > discussion thread) to try it out the way we set
> > > '-XX:MaxDirectMemorySize'
> > > > > in current design and may switch to other alternatives if it
> doesn't
> > > work
> > > > > out well, I would suggest the same for this one. I suggest that we
> > > first
> > > > > try it without the splitting config option, and we can easily add
> it
> > if
> > > > it
> > > > > doesn't work well.
> > > > > - Agree that it's really important to have good documentation for
> > this.
> > > > See
> > > > > above.
> > > > >
> > > > > @Zhijiang
> > > > > - Thanks for the input. My understanding is that 'shuffle memory'
> is
> > a
> > > > > portion of the task executor memory reserved for the shuffle
> > component.
> > > > The
> > > > > way shuffle component use these memory (local buffer pool, netty
> > > internal
> > > > > memory, etc.) can be different depending on the shuffle
> > implementation.
> > > > The
> > > > > task executor (outside of the shuffle implementation) should only
> > know
> > > > the
> > > > > overall memory usage of the shuffle component but no need to
> > understand
> > > > > more details inside the shuffle implementation.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 10:41 PM zhijiang <
> wangzhijiang...@aliyun.com
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks for proposing this FLIP and also +1 on my side.
> > > > > >
> > > > > > @Andrey Zagrebin For the point of "network memory is actually
> used
> > > more
> > > > > > than shuffling", I guess that the component of queryable state is
> > > also
> > > > > > using network/netty stack atm, which is outside of shuffling.
> > > > > > In addition, if we only consider the shuffle memory provided by
> > > shuffle
> > > > > > service interface, we should not only consider the memory used by
> > > local
> > > > > > buffer pool, but also consider the netty internal memory
> > > > > > usages as the overhead, especially we have not the zero-copy
> > > > improvement
> > > > > > on dowstream read side. This issue might be out of the vote
> scope,
> > > just
> > > > > > think of we have this issue in [1]. :)
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-12110
> > > > > >
> > > > > > Best,
> > > > > > Zhijiang
> > > > > >
> ------------------------------------------------------------------
> > > > > > From:Till Rohrmann <trohrm...@apache.org>
> > > > > > Send Time:2019年9月3日(星期二) 15:07
> > > > > > To:dev <dev@flink.apache.org>
> > > > > > Subject:Re: [VOTE] FLIP-49: Unified Memory Configuration for
> > > > > TaskExecutors
> > > > > >
> > > > > > Thanks for creating this FLIP and starting the vote Xintong.
> > > > > >
> > > > > > +1 for the proposal from my side.
> > > > > >
> > > > > > I agree with Stephan that we might wanna revisit some of the
> > > > > configuration
> > > > > > names.
> > > > > >
> > > > > > If I understood it correctly, then Task Off-heap memory
> represents
> > > the
> > > > > > direct memory used by the user code, right? How would users
> > configure
> > > > > > native memory requirements for the user code? If it is part of
> Task
> > > Off
> > > > > > heap memory, then we need to split it to set
> > -XX:MaxDirectMemorySize
> > > > > > correctly or to introduce another configuration option.
> > > > > >
> > > > > > Given all these configuration options, I can see that users will
> > get
> > > > > > confused quite easily. Therefore, I would like to emphasise that
> we
> > > > need
> > > > > a
> > > > > > very good documentation about how to properly configure Flink
> > > processes
> > > > > and
> > > > > > which knobs to turn in which cases.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Sep 3, 2019 at 2:34 PM Andrey Zagrebin <
> > and...@ververica.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for starting the vote Xintong
> > > > > > >
> > > > > > > Also +1 for the proposed FLIP-49.
> > > > > > >
> > > > > > > @Stephan regarding namings: network vs shuffle.
> > > > > > > My understanding so far was that the network memory is what we
> > > > > basically
> > > > > > > give to Shuffle implementations and default netty
> implementation
> > > uses
> > > > > it
> > > > > > in
> > > > > > > particular mostly for networking.
> > > > > > > Are the network pools used for something else outside of the
> > > > shuffling
> > > > > > > scope?
> > > > > > >
> > > > > > > best,
> > > > > > > Andrey
> > > > > > >
> > > > > > > On Tue, Sep 3, 2019 at 1:01 PM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > +1 to the proposal in general
> > > > > > > >
> > > > > > > > A few things seems to be a bit put of sync with the latest
> > > > > discussions
> > > > > > > > though.
> > > > > > > >
> > > > > > > > The section about JVM Parameters states that the
> > > > > > > > -XX:MaxDirectMemorySize value is set to Task Off-heap Memory,
> > > > Shuffle
> > > > > > > > Memory and JVM Overhead.
> > > > > > > > The way I understand the last discussion conclusion is that
> it
> > is
> > > > > only
> > > > > > > the
> > > > > > > > sum of shuffle memory and user-defined direct memory.
> > > > > > > >
> > > > > > > > I am someone neutral but unsure about is the separation
> between
> > > > > > > > "taskmanager.memory.framework.heap" and
> > > > > "taskmanager.memory.task.heap".
> > > > > > > > Could that be simply combined under
> > > "taskmanager.memory.javaheap"?
> > > > > > > >
> > > > > > > > It might be good to also expose these values somehow in the
> web
> > > UI
> > > > so
> > > > > > > that
> > > > > > > > users see immediately what amount of memory TMs assume to use
> > for
> > > > > what.
> > > > > > > >
> > > > > > > > I assume config key names and default values might be
> adjusted
> > > over
> > > > > > time
> > > > > > > as
> > > > > > > > we get feedback.
> > > > > > > >   - I would keep the network memory under the name
> > > > > > > > "taskmanager.memory.network". Because network memory is
> > actually
> > > > used
> > > > > > for
> > > > > > > > more than shuffling. Also, the old config key seems good, so
> > why
> > > > > change
> > > > > > > it?
> > > > > > > >
> > > > > > > > One thing to be aware of is that often, the Java Heap is
> > > understood
> > > > > as
> > > > > > > > "managed memory" as a whole, because it is managed by the GC
> > not
> > > > > > > explicitly
> > > > > > > > by the user.
> > > > > > > > So we need to make sure that we don't confuse users by
> speaking
> > > of
> > > > > > > managed
> > > > > > > > heap and unmanaged heap. All heap is managed in Java. Some
> > memory
> > > > is
> > > > > > > > explicitly managed by Flink.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 2, 2019 at 3:08 PM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'm here to re-start the voting process for FLIP-49 [1],
> with
> > > > > respect
> > > > > > > to
> > > > > > > > > consensus reached in this thread [2] regarding some new
> > > comments
> > > > > and
> > > > > > > > > concerns.
> > > > > > > > >
> > > > > > > > > This voting will be open for at least 72 hours. I'll try to
> > > close
> > > > > it
> > > > > > > Sep.
> > > > > > > > > 5, 14:00 UTC, unless there is an objection or not enough
> > votes.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-49-Unified-Memory-Configuration-for-TaskExecutors-td31436.html
> > > > > > > > >
> > > > > > > > > On Tue, Aug 27, 2019 at 9:29 PM Xintong Song <
> > > > > tonysong...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Alright, then let's keep the discussion in the DISCUSS
> > > mailing
> > > > > > > thread,
> > > > > > > > > and
> > > > > > > > > > see whether we need to restart the vote.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann <
> > > > > > trohrm...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> I had a couple of comments concerning the implementation
> > > plan.
> > > > > > I've
> > > > > > > > > posted
> > > > > > > > > >> them to the original discussion thread. Depending on the
> > > > outcome
> > > > > > of
> > > > > > > > this
> > > > > > > > > >> discussion we might need to restart the vote.
> > > > > > > > > >>
> > > > > > > > > >> Cheers,
> > > > > > > > > >> Till
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Aug 27, 2019 at 11:14 AM Xintong Song <
> > > > > > > tonysong...@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi all,
> > > > > > > > > >> >
> > > > > > > > > >> > I would like to start the voting process for FLIP-49
> > [1],
> > > > > which
> > > > > > is
> > > > > > > > > >> > discussed and reached consensus in this thread [2].
> > > > > > > > > >> >
> > > > > > > > > >> > This voting will be open for at least 72 hours. I'll
> try
> > > to
> > > > > > close
> > > > > > > it
> > > > > > > > > >> Aug.
> > > > > > > > > >> > 30 10:00 UTC, unless there is an objection or not
> enough
> > > > > votes.
> > > > > > > > > >> >
> > > > > > > > > >> > Thank you~
> > > > > > > > > >> >
> > > > > > > > > >> > Xintong Song
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > [1]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > >> > [2]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-49-Unified-Memory-Configuration-for-TaskExecutors-td31436.html
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to