Hi all,

Stephan, Till and me had another offline discussion today. Here is the
outcome of our brainstorm.

We agreed to have process.size in the default settings with the explanation
of flink.size alternative in the comment.
This way we keep -tm as a shortcut to process.size only and any
inconsistencies fail fast as if configured in yaml.
I will also follow-up on the thread "[Discuss] Tuning FLIP-49 configuration
default values" with a bit more details.

If no further objections, we can conclude this last point in this
discussion.

Best,
Andrey

On Tue, Jan 14, 2020 at 4:28 PM Stephan Ewen <se...@apache.org> wrote:

> I think that problem exists anyways and is independent of the "-tm" option.
>
> You can have a combination of `task.heap.size` and `managed.size` and
> `framework.heap.size` that conflicts with `flink.size`. In that case, we
> get an exception during the startup (incompatible memory configuration)?
> That is the price for having these "shortcut" options (`flink.size` and
> `process.size`). But it is a fair price, because the shortcuts are very
> valuable to most users.
>
> That is added with the "-tm" setting is that this is an easy way to get the
> shortcut setting added, even if it was not in the config. So where a config
> worked in standalone, it can now fail in a container setup when "-tm" is
> used.
> But that is expected, I guess, when you start manually tune different
> memory types and end up defining an inconsistent combination. And it never
> conflicts with the default setup, only with manually tuned regions.
>
> But this example shows why we need to have log output for the logic that
> validates and configures the memory settings, so that users understand what
> is happening.
>
> Best,
> Stephan
>
>
> On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Clearing the `flink.size` option and setting `process.size` could indeed
> be
> > a solution. The thing I'm wondering is what would happen if the user has
> > configured `task.heap.size` and `managed.size` instead of `flink.size`?
> > Would we also ignore these settings? If not, then we risk to run into the
> > situation that TaskExecutorResourceUtils fails because the memory does
> not
> > add up. I guess the thing which bugs me a bit is the special casing which
> > could lead easily into inconsistent behaviour if we don't cover all
> cases.
> > The consequence of using slightly different concepts (`flink.size`,
> > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to
> > keep the status quo is an increased maintenance overhead which we should
> be
> > aware of.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <tonysong...@gmail.com>
> > wrote:
> >
> > > True, even we have "process.size" rather than "flink.size" in the
> default
> > > config file, user can still have "flink.size" in their custom config
> > file.
> > > If we consider "-tm" as a shortcut for users to override the TM memory
> > > size, then it makes less sense to require users to remove "flink.size"
> > from
> > > their config file whenever then want to use "-tm".
> > >
> > > I'm convinced that ignoring "flink.size" might not be a bad idea.
> > > And I think we should also update the description of "-tm" (in
> > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite
> > > "flink.size" and "process.size".
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Would be good to hear the thoughts of some more Yarn users, though.
> > > >
> > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > I think we need an interpretation of "-tm" regardless of what is in
> > the
> > > > > default configuration, because we can always have a modified
> > > > configuration
> > > > > and then a user passes the "-tm" flag.
> > > > >
> > > > > I kind of like the first proposal: Interpret "-tm" as "override
> > memory
> > > > > size config and set the Yarn TM container size." It would mean
> > exactly
> > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value
> > as "
> > > > > "taskmanager.memory.process.size".
> > > > > That does not sound too bad to me.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <
> > azagre...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> While working on changing process memory to Flink memory in
> default
> > > > >> configuration, Xintong encountered a problem.
> > > > >> When -tm option is used to rewrite container memory, basically
> > process
> > > > >> memory, it can collide with the default Flink memory.
> > > > >> For legacy users it should not be a problem as we adjusted the
> > legacy
> > > > heap
> > > > >> size option to be interpreted differently for standalone and
> > container
> > > > >> modes.
> > > > >>
> > > > >> One solution could be to say in -tm docs that we rewrite both
> > options
> > > > >> under
> > > > >> the hood: process and Flink memory, basically unset Flink memory
> > from
> > > > yaml
> > > > >> config.
> > > > >> The downside is that this adds more magic.
> > > > >>
> > > > >> Alternatively, we can keep process memory in default config and,
> as
> > > > >> mentioned before, increase it to maintain the user experience by
> > > > matching
> > > > >> the previous default setting for heap (now Flink in standalone)
> > size.
> > > > >> The Flink memory can be mentioned in process memory comment as a
> > > simpler
> > > > >> alternative which does not require accounting for JVM overhead.
> > > > >> The downside is again more confusion while trying out Flink and
> > tuning
> > > > >> memory at the same time.
> > > > >> On the other hand, if memory already needs to be tuned it should
> > > > >> quite quickly lead to the necessity of understanding the memory
> > model
> > > in
> > > > >> Flink.
> > > > >>
> > > > >> Best,
> > > > >> Andrey
> > > > >>
> > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > >>
> > > > >> > Great! Thanks, guys, for the continued effort on this topic!
> > > > >> >
> > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <
> > tonysong...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks all for the discussion. I believe we have get consensus
> > on
> > > > all
> > > > >> the
> > > > >> > > open questions discussed in this thread.
> > > > >> > >
> > > > >> > > Since Andrey already create a jira ticket for renaming shuffle
> > > > memory
> > > > >> > > config keys with "taskmanager.memory.network.*", I'll create
> > > ticket
> > > > >> for
> > > > >> > the
> > > > >> > > other topic that puts flink.size in flink-conf.yaml.
> > > > >> > >
> > > > >> > > Thank you~
> > > > >> > >
> > > > >> > > Xintong Song
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <
> > > > azagre...@apache.org>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > It also looks to me that we should only swap network and
> > memory
> > > in
> > > > >> the
> > > > >> > > > option names: 'taskmanager.memory.network.*'.
> > > > >> > > > There is no strong consensus towards using new 'shuffle'
> > naming
> > > so
> > > > >> we
> > > > >> > can
> > > > >> > > > just rename it to  'taskmanager.memory.network.*' as
> 'shuffle'
> > > > >> naming
> > > > >> > has
> > > > >> > > > never been released.
> > > > >> > > > When we have other shuffle services and start advertising
> more
> > > > this
> > > > >> > > concept
> > > > >> > > > in Flink, we could revisit again the whole naming for this
> > > > concept.
> > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to