Clearing the `flink.size` option and setting `process.size` could indeed be a solution. The thing I'm wondering is what would happen if the user has configured `task.heap.size` and `managed.size` instead of `flink.size`? Would we also ignore these settings? If not, then we risk to run into the situation that TaskExecutorResourceUtils fails because the memory does not add up. I guess the thing which bugs me a bit is the special casing which could lead easily into inconsistent behaviour if we don't cover all cases. The consequence of using slightly different concepts (`flink.size`, `process.size`) in standalone vs. container/Yarn/Mesos mode in order to keep the status quo is an increased maintenance overhead which we should be aware of.
Cheers, Till On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <tonysong...@gmail.com> wrote: > True, even we have "process.size" rather than "flink.size" in the default > config file, user can still have "flink.size" in their custom config file. > If we consider "-tm" as a shortcut for users to override the TM memory > size, then it makes less sense to require users to remove "flink.size" from > their config file whenever then want to use "-tm". > > I'm convinced that ignoring "flink.size" might not be a bad idea. > And I think we should also update the description of "-tm" (in > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > "flink.size" and "process.size". > > Thank you~ > > Xintong Song > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <se...@apache.org> wrote: > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <se...@apache.org> wrote: > > > > > I think we need an interpretation of "-tm" regardless of what is in the > > > default configuration, because we can always have a modified > > configuration > > > and then a user passes the "-tm" flag. > > > > > > I kind of like the first proposal: Interpret "-tm" as "override memory > > > size config and set the Yarn TM container size." It would mean exactly > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value as " > > > "taskmanager.memory.process.size". > > > That does not sound too bad to me. > > > > > > Best, > > > Stephan > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <azagre...@apache.org> > > > wrote: > > > > > >> Hi all, > > >> > > >> While working on changing process memory to Flink memory in default > > >> configuration, Xintong encountered a problem. > > >> When -tm option is used to rewrite container memory, basically process > > >> memory, it can collide with the default Flink memory. > > >> For legacy users it should not be a problem as we adjusted the legacy > > heap > > >> size option to be interpreted differently for standalone and container > > >> modes. > > >> > > >> One solution could be to say in -tm docs that we rewrite both options > > >> under > > >> the hood: process and Flink memory, basically unset Flink memory from > > yaml > > >> config. > > >> The downside is that this adds more magic. > > >> > > >> Alternatively, we can keep process memory in default config and, as > > >> mentioned before, increase it to maintain the user experience by > > matching > > >> the previous default setting for heap (now Flink in standalone) size. > > >> The Flink memory can be mentioned in process memory comment as a > simpler > > >> alternative which does not require accounting for JVM overhead. > > >> The downside is again more confusion while trying out Flink and tuning > > >> memory at the same time. > > >> On the other hand, if memory already needs to be tuned it should > > >> quite quickly lead to the necessity of understanding the memory model > in > > >> Flink. > > >> > > >> Best, > > >> Andrey > > >> > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <se...@apache.org> > wrote: > > >> > > >> > Great! Thanks, guys, for the continued effort on this topic! > > >> > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <tonysong...@gmail.com> > > >> wrote: > > >> > > > >> > > Thanks all for the discussion. I believe we have get consensus on > > all > > >> the > > >> > > open questions discussed in this thread. > > >> > > > > >> > > Since Andrey already create a jira ticket for renaming shuffle > > memory > > >> > > config keys with "taskmanager.memory.network.*", I'll create > ticket > > >> for > > >> > the > > >> > > other topic that puts flink.size in flink-conf.yaml. > > >> > > > > >> > > Thank you~ > > >> > > > > >> > > Xintong Song > > >> > > > > >> > > > > >> > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > azagre...@apache.org> > > >> > > wrote: > > >> > > > > >> > > > It also looks to me that we should only swap network and memory > in > > >> the > > >> > > > option names: 'taskmanager.memory.network.*'. > > >> > > > There is no strong consensus towards using new 'shuffle' naming > so > > >> we > > >> > can > > >> > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' > > >> naming > > >> > has > > >> > > > never been released. > > >> > > > When we have other shuffle services and start advertising more > > this > > >> > > concept > > >> > > > in Flink, we could revisit again the whole naming for this > > concept. > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > >> > > > > > >> > > > > >> > > > >> > > > > > >