Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song Wed, 15 Jan 2020 02:10:46 -0800

Thank you all for the well discussion.

If there's no further concerns or objections, I would like to conclude this
thread into the following action items.


   - Change default value of "taskmanager.memory.jvm-overhead.min" to 192MB.
   - Change default value of "taskmanager.memory.jvm-metaspace.size" to
   96MB.
   - Change the value of "taskmanager.memory.process.size" in the default
   "flink-conf.yaml" to 1568MB.
   - Relax JVM overhead sanity check, so that the fraction does not need to
   be strictly followed, as long as the min/max range is respected.


Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <tonysong...@gmail.com> wrote:

> There's more idea from offline discussion with Andrey.
>
> If we decide to make metaspace 96MB, we can also make process.size 1568MB
> (1.5G + 32MB).
> According to the spreadsheet
> <https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0>,
> 1.5GB process size and 64MB metaspace result in memory sizes with the
> values to be powers of 2.
> When increasing the metaspace from 64MB to 96MB, it would be good to
> preserve that alignment, for better readability that later we explain the
> memory configuration and calculations in documents.
> I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
> memory consumption.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <tonysong...@gmail.com>
> wrote:
>
>> Thanks for the discussion, Stephan, Till and Andrey.
>>
>> +1 for the managed fraction (0.4) and process.size (1.5G).
>>
>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>>> small correction for better power 2 alignment of sizes
>>>
>> Sorry, this was a typo (and the same for the jira comment which is
>> copy-pasted). It was 192mb used in the tuning report.
>>
>> *meta space at least 96Mb?*
>>> There is still a concern about JVM metaspace being just 64Mb.
>>> We should confirm that it is not a problem by trying to test it also with
>>> the SQL jobs, Blink planner.
>>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>>> where
>>> more classes are generated/loaded.
>>> We can look into this tomorrow.
>>>
>> I have already tried the setting metaspace to 64Mb with the e2e tests,
>> where I believe various sql / blink / tpc-ds test cases are included. (See
>> https://travis-ci.com/flink-ci/flink/builds/142970113 )
>> However, I'm also ok with 96Mb, since we are increasing the process.size
>> to 1.5G.
>> My original concern for having larger metaspace size was that we may
>> result in too small flink.size for out-of-box configuration on
>> containerized setups.
>>
>> *sanity check of JVM overhead*
>>> When the explicitly configured process and flink memory sizes are
>>> verified
>>> with the JVM meta space and overhead,
>>> JVM overhead does not have to be the exact fraction.
>>> It can be just within its min/max range, similar to how it is now for
>>> network/shuffle memory check after FLINK-15300.
>>>
>> Also +1 for this.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <azagre...@apache.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> Stephan, Till and me had another offline discussion today. Here is the
>>> outcome of our brainstorm.
>>>
>>> *managed fraction 0.4*
>>> just confirmed what we already discussed here.
>>>
>>> *process.size = 1536Mb (1,5Gb)*
>>> We agreed to have process.size in the default settings with the
>>> explanation
>>> of flink.size alternative in the comment.
>>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
>>> the
>>> earlier provided calculation spreadsheet,
>>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for
>>> all
>>> new setups.
>>> This should provide good enough experience for trying out Flink.
>>>
>>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>>> small correction for better power 2 alignment of sizes
>>>
>>> *meta space at least 96Mb?*
>>> There is still a concern about JVM metaspace being just 64Mb.
>>> We should confirm that it is not a problem by trying to test it also with
>>> the SQL jobs, Blink planner.
>>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>>> where
>>> more classes are generated/loaded.
>>> We can look into this tomorrow.
>>>
>>> *sanity check of JVM overhead*
>>> When the explicitly configured process and flink memory sizes are
>>> verified
>>> with the JVM meta space and overhead,
>>> JVM overhead does not have to be the exact fraction.
>>> It can be just within its min/max range, similar to how it is now for
>>> network/shuffle memory check after FLINK-15300.
>>>
>>> Best,Andrey
>>>
>>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <se...@apache.org> wrote:
>>>
>>> > I like the idea of having a larger default "flink.size" in the
>>> config.yaml.
>>> > Maybe we don't need to double it, but something like 1280m would be
>>> okay?
>>> >
>>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <azagre...@apache.org>
>>> > wrote:
>>> >
>>> > > Hi all!
>>> > >
>>> > > Great that we have already tried out new FLIP-49 with the bigger
>>> jobs.
>>> > >
>>> > > I am also +1 for the JVM metaspace and overhead changes.
>>> > >
>>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
>>> > memory
>>> > > for Rocksdb limiting case.
>>> > >
>>> > > In general, this looks mostly to be about memory distribution
>>> between JVM
>>> > > heap and managed off-heap.
>>> > > Comparing to the previous default setup, the JVM heap dropped
>>> (especially
>>> > > for standalone) mostly due to moving managed from heap to off-heap
>>> and
>>> > then
>>> > > also adding framework off-heap.
>>> > > In general, this can be the most important consequence for beginners
>>> and
>>> > > those who rely on the default configuration.
>>> > > Especially the legacy default configuration in standalone with
>>> falling
>>> > back
>>> > > heap.size to flink.size but there it seems we cannot do too much now.
>>> > >
>>> > > I prepared a spreadsheet
>>> > > <
>>> > >
>>> >
>>> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
>>> > > >
>>> > > to play with numbers for the mentioned in the report setups.
>>> > >
>>> > > One idea would be to set process size (or smaller flink size
>>> > respectively)
>>> > > to a bigger default number, like 2048.
>>> > > In this case, the abs derived default JVM heap and managed memory are
>>> > close
>>> > > to the previous defaults, especially for managed fraction 0.3.
>>> > > This should align the defaults with the previous standalone try-out
>>> > > experience where the increased off-heap memory is not strictly
>>> controlled
>>> > > by the environment anyways.
>>> > > The consequence for container users who relied on and updated the
>>> default
>>> > > configuration is that the containers will be requested with the
>>> double
>>> > > size.
>>> > >
>>> > > Best,
>>> > > Andrey
>>> > >
>>> > >
>>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <trohrm...@apache.org
>>> >
>>> > > wrote:
>>> > >
>>> > > > +1 for the JVM metaspace and overhead changes.
>>> > > >
>>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
>>> trohrm...@apache.org>
>>> > > > wrote:
>>> > > >
>>> > > >> I guess one of the most important results of this experiment is to
>>> > have
>>> > > a
>>> > > >> good tuning guide available for users who are past the initial
>>> try-out
>>> > > >> phase because the default settings will be kind of a compromise. I
>>> > > assume
>>> > > >> that this is part of the outstanding FLIP-49 documentation task.
>>> > > >>
>>> > > >> If we limit RocksDB's memory consumption by default, then I
>>> believe
>>> > that
>>> > > >> 0.4 would give the better all-round experience as it leaves a bit
>>> more
>>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should
>>> > > optimize
>>> > > >> the default settings for a configuration where the user still
>>> needs to
>>> > > >> activate the strict memory limiting for RocksDB. In this case, I
>>> would
>>> > > >> expect that the user could also adapt the managed memory fraction.
>>> > > >>
>>> > > >> Cheers,
>>> > > >> Till
>>> > > >>
>>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
>>> tonysong...@gmail.com>
>>> > > >> wrote:
>>> > > >>
>>> > > >>> Thanks for the feedback, Stephan and Kurt.
>>> > > >>>
>>> > > >>> @Stephan
>>> > > >>>
>>> > > >>> Regarding managed memory fraction,
>>> > > >>> - It makes sense to keep the default value 0.4, if we assume
>>> rocksdb
>>> > > >>> memory is limited by default.
>>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
>>> > usage.
>>> > > >>> And I'm positive to change it.
>>> > > >>> - Personally, I don't like the idea that we the out-of-box
>>> experience
>>> > > >>> (for which we set the default fraction) relies on that users will
>>> > > manually
>>> > > >>> turn another switch on.
>>> > > >>>
>>> > > >>> Regarding framework heap memory,
>>> > > >>> - The major reason we set it by default is, as you mentioned,
>>> that to
>>> > > >>> have a safe net of minimal JVM heap size.
>>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
>>> > allocation),
>>> > > >>> we want to reserve some heap memory that will not go into the
>>> slot
>>> > > >>> profiles. That's why we decide the default value according to the
>>> > heap
>>> > > >>> memory usage of an empty task executor.
>>> > > >>>
>>> > > >>> @Kurt
>>> > > >>> Regarding metaspace,
>>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
>>> takes
>>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
>>> > > >>> - If we have the same metaspace problem on TMs, then yes,
>>> changing it
>>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
>>> > benchmark
>>> > > >>> should not be considered as out-of-box experience and it makes
>>> sense
>>> > to
>>> > > >>> tune the configurations for it. I think the smaller metaspace
>>> size
>>> > > would be
>>> > > >>> a better choice for the first trying-out, where a job should not
>>> be
>>> > too
>>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
>>> > > >>>
>>> > > >>> Thank you~
>>> > > >>>
>>> > > >>> Xintong Song
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <ykt...@gmail.com>
>>> wrote:
>>> > > >>>
>>> > > >>>> HI Xingtong,
>>> > > >>>>
>>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
>>> > > >>>> metaspace size and full gc which
>>> > > >>>> caused by lots of classloadings of source input split. Could you
>>> > check
>>> > > >>>> whether changing the default
>>> > > >>>> value from 128MB to 64MB will make it worse?
>>> > > >>>>
>>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
>>> > > >>>>
>>> > > >>>> Best,
>>> > > >>>> Kurt
>>> > > >>>>
>>> > > >>>>
>>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <se...@apache.org>
>>> > > wrote:
>>> > > >>>>
>>> > > >>>>> Hi all!
>>> > > >>>>>
>>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on
>>> your
>>> > > >>>>> analysis,
>>> > > >>>>> here are some thoughts:
>>> > > >>>>>
>>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>>> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>>> > > >>>>>
>>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
>>> > change
>>> > > >>>>> it,
>>> > > >>>>> for the following reasons:
>>> > > >>>>>
>>> > > >>>>>   - We should assume RocksDB will be limited to managed memory
>>> by
>>> > > >>>>> default.
>>> > > >>>>> This will either be active by default or we would encourage
>>> > everyone
>>> > > >>>>> to use
>>> > > >>>>> this by default, because otherwise it is super hard to reason
>>> about
>>> > > the
>>> > > >>>>> RocksDB footprint.
>>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
>>> than
>>> > > half
>>> > > >>>>> of
>>> > > >>>>> the managed memory from 1.9.
>>> > > >>>>>   - I am not sure if the managed memory fraction is a value
>>> that
>>> > all
>>> > > >>>>> users
>>> > > >>>>> adjust immediately when scaling up the memory during their
>>> first
>>> > > >>>>> try-out
>>> > > >>>>> phase. I would assume that most users initially only adjust
>>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
>>> will
>>> > > lead
>>> > > >>>>> to
>>> > > >>>>> having too large heaps and very little RocksDB / batch memory
>>> even
>>> > > when
>>> > > >>>>> scaling up during the initial exploration.
>>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>>> > > >>>>> benchmarks.
>>> > > >>>>> So maybe keeping it at 0.4 could work?
>>> > > >>>>>
>>> > > >>>>> And one question: Why do we set the Framework Heap by default?
>>> Is
>>> > > that
>>> > > >>>>> so
>>> > > >>>>> we reduce the managed memory further is less than framework
>>> heap
>>> > > would
>>> > > >>>>> be
>>> > > >>>>> left from the JVM heap?
>>> > > >>>>>
>>> > > >>>>> Best,
>>> > > >>>>> Stephan
>>> > > >>>>>
>>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
>>> > tonysong...@gmail.com>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> > Hi all,
>>> > > >>>>> >
>>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
>>> default
>>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
>>> > > >>>>> >
>>> > > >>>>> > After spending time analyzing and tuning the configurations,
>>> I've
>>> > > >>>>> come
>>> > > >>>>> > with several findings. To be brief, I would suggest the
>>> following
>>> > > >>>>> changes,
>>> > > >>>>> > and for more details please take a look at my tuning report
>>> [2].
>>> > > >>>>> >
>>> > > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>>> > > >>>>> >    - Change default JVM overhead min size from 128MB to
>>> 196MB.
>>> > > >>>>> >
>>> > > >>>>> > Looking forward to your feedback.
>>> > > >>>>> >
>>> > > >>>>> > Thank you~
>>> > > >>>>> >
>>> > > >>>>> > Xintong Song
>>> > > >>>>> >
>>> > > >>>>> >
>>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>>> > > >>>>> >
>>> > > >>>>> > [2]
>>> > > >>>>> >
>>> > > >>>>>
>>> > >
>>> >
>>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>>> > > >>>>> >
>>> > > >>>>> >
>>> > > >>>>>
>>> > > >>>>
>>> > >
>>> >
>>>
>>

Re: [Discuss] Tuning FLIP-49 configuration default values.

Reply via email to