Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song Wed, 15 Jan 2020 01:51:36 -0800

There's more idea from offline discussion with Andrey.

If we decide to make metaspace 96MB, we can also make process.size 1568MB
(1.5G + 32MB).
According to the spreadsheet
<https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0>,
1.5GB process size and 64MB metaspace result in memory sizes with the
values to be powers of 2.
When increasing the metaspace from 64MB to 96MB, it would be good to
preserve that alignment, for better readability that later we explain the
memory configuration and calculations in documents.
I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
memory consumption.


Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <[email protected]> wrote:

> Thanks for the discussion, Stephan, Till and Andrey.
>
> +1 for the managed fraction (0.4) and process.size (1.5G).
>
> *JVM overhead min 196 -> 192Mb (128 + 64)*
>> small correction for better power 2 alignment of sizes
>>
> Sorry, this was a typo (and the same for the jira comment which is
> copy-pasted). It was 192mb used in the tuning report.
>
> *meta space at least 96Mb?*
>> There is still a concern about JVM metaspace being just 64Mb.
>> We should confirm that it is not a problem by trying to test it also with
>> the SQL jobs, Blink planner.
>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>> where
>> more classes are generated/loaded.
>> We can look into this tomorrow.
>>
> I have already tried the setting metaspace to 64Mb with the e2e tests,
> where I believe various sql / blink / tpc-ds test cases are included. (See
> https://travis-ci.com/flink-ci/flink/builds/142970113 )
> However, I'm also ok with 96Mb, since we are increasing the process.size
> to 1.5G.
> My original concern for having larger metaspace size was that we may
> result in too small flink.size for out-of-box configuration on
> containerized setups.
>
> *sanity check of JVM overhead*
>> When the explicitly configured process and flink memory sizes are verified
>> with the JVM meta space and overhead,
>> JVM overhead does not have to be the exact fraction.
>> It can be just within its min/max range, similar to how it is now for
>> network/shuffle memory check after FLINK-15300.
>>
> Also +1 for this.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Stephan, Till and me had another offline discussion today. Here is the
>> outcome of our brainstorm.
>>
>> *managed fraction 0.4*
>> just confirmed what we already discussed here.
>>
>> *process.size = 1536Mb (1,5Gb)*
>> We agreed to have process.size in the default settings with the
>> explanation
>> of flink.size alternative in the comment.
>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
>> the
>> earlier provided calculation spreadsheet,
>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for all
>> new setups.
>> This should provide good enough experience for trying out Flink.
>>
>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>> small correction for better power 2 alignment of sizes
>>
>> *meta space at least 96Mb?*
>> There is still a concern about JVM metaspace being just 64Mb.
>> We should confirm that it is not a problem by trying to test it also with
>> the SQL jobs, Blink planner.
>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>> where
>> more classes are generated/loaded.
>> We can look into this tomorrow.
>>
>> *sanity check of JVM overhead*
>> When the explicitly configured process and flink memory sizes are verified
>> with the JVM meta space and overhead,
>> JVM overhead does not have to be the exact fraction.
>> It can be just within its min/max range, similar to how it is now for
>> network/shuffle memory check after FLINK-15300.
>>
>> Best,Andrey
>>
>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[email protected]> wrote:
>>
>> > I like the idea of having a larger default "flink.size" in the
>> config.yaml.
>> > Maybe we don't need to double it, but something like 1280m would be
>> okay?
>> >
>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[email protected]>
>> > wrote:
>> >
>> > > Hi all!
>> > >
>> > > Great that we have already tried out new FLIP-49 with the bigger jobs.
>> > >
>> > > I am also +1 for the JVM metaspace and overhead changes.
>> > >
>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
>> > memory
>> > > for Rocksdb limiting case.
>> > >
>> > > In general, this looks mostly to be about memory distribution between
>> JVM
>> > > heap and managed off-heap.
>> > > Comparing to the previous default setup, the JVM heap dropped
>> (especially
>> > > for standalone) mostly due to moving managed from heap to off-heap and
>> > then
>> > > also adding framework off-heap.
>> > > In general, this can be the most important consequence for beginners
>> and
>> > > those who rely on the default configuration.
>> > > Especially the legacy default configuration in standalone with falling
>> > back
>> > > heap.size to flink.size but there it seems we cannot do too much now.
>> > >
>> > > I prepared a spreadsheet
>> > > <
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
>> > > >
>> > > to play with numbers for the mentioned in the report setups.
>> > >
>> > > One idea would be to set process size (or smaller flink size
>> > respectively)
>> > > to a bigger default number, like 2048.
>> > > In this case, the abs derived default JVM heap and managed memory are
>> > close
>> > > to the previous defaults, especially for managed fraction 0.3.
>> > > This should align the defaults with the previous standalone try-out
>> > > experience where the increased off-heap memory is not strictly
>> controlled
>> > > by the environment anyways.
>> > > The consequence for container users who relied on and updated the
>> default
>> > > configuration is that the containers will be requested with the double
>> > > size.
>> > >
>> > > Best,
>> > > Andrey
>> > >
>> > >
>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[email protected]>
>> > > wrote:
>> > >
>> > > > +1 for the JVM metaspace and overhead changes.
>> > > >
>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > >> I guess one of the most important results of this experiment is to
>> > have
>> > > a
>> > > >> good tuning guide available for users who are past the initial
>> try-out
>> > > >> phase because the default settings will be kind of a compromise. I
>> > > assume
>> > > >> that this is part of the outstanding FLIP-49 documentation task.
>> > > >>
>> > > >> If we limit RocksDB's memory consumption by default, then I believe
>> > that
>> > > >> 0.4 would give the better all-round experience as it leaves a bit
>> more
>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should
>> > > optimize
>> > > >> the default settings for a configuration where the user still
>> needs to
>> > > >> activate the strict memory limiting for RocksDB. In this case, I
>> would
>> > > >> expect that the user could also adapt the managed memory fraction.
>> > > >>
>> > > >> Cheers,
>> > > >> Till
>> > > >>
>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
>> [email protected]>
>> > > >> wrote:
>> > > >>
>> > > >>> Thanks for the feedback, Stephan and Kurt.
>> > > >>>
>> > > >>> @Stephan
>> > > >>>
>> > > >>> Regarding managed memory fraction,
>> > > >>> - It makes sense to keep the default value 0.4, if we assume
>> rocksdb
>> > > >>> memory is limited by default.
>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
>> > usage.
>> > > >>> And I'm positive to change it.
>> > > >>> - Personally, I don't like the idea that we the out-of-box
>> experience
>> > > >>> (for which we set the default fraction) relies on that users will
>> > > manually
>> > > >>> turn another switch on.
>> > > >>>
>> > > >>> Regarding framework heap memory,
>> > > >>> - The major reason we set it by default is, as you mentioned,
>> that to
>> > > >>> have a safe net of minimal JVM heap size.
>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
>> > allocation),
>> > > >>> we want to reserve some heap memory that will not go into the slot
>> > > >>> profiles. That's why we decide the default value according to the
>> > heap
>> > > >>> memory usage of an empty task executor.
>> > > >>>
>> > > >>> @Kurt
>> > > >>> Regarding metaspace,
>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
>> takes
>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
>> > > >>> - If we have the same metaspace problem on TMs, then yes,
>> changing it
>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
>> > benchmark
>> > > >>> should not be considered as out-of-box experience and it makes
>> sense
>> > to
>> > > >>> tune the configurations for it. I think the smaller metaspace size
>> > > would be
>> > > >>> a better choice for the first trying-out, where a job should not
>> be
>> > too
>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
>> > > >>>
>> > > >>> Thank you~
>> > > >>>
>> > > >>> Xintong Song
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[email protected]>
>> wrote:
>> > > >>>
>> > > >>>> HI Xingtong,
>> > > >>>>
>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
>> > > >>>> metaspace size and full gc which
>> > > >>>> caused by lots of classloadings of source input split. Could you
>> > check
>> > > >>>> whether changing the default
>> > > >>>> value from 128MB to 64MB will make it worse?
>> > > >>>>
>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
>> > > >>>>
>> > > >>>> Best,
>> > > >>>> Kurt
>> > > >>>>
>> > > >>>>
>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[email protected]>
>> > > wrote:
>> > > >>>>
>> > > >>>>> Hi all!
>> > > >>>>>
>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
>> > > >>>>> analysis,
>> > > >>>>> here are some thoughts:
>> > > >>>>>
>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>> > > >>>>>
>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
>> > change
>> > > >>>>> it,
>> > > >>>>> for the following reasons:
>> > > >>>>>
>> > > >>>>>   - We should assume RocksDB will be limited to managed memory
>> by
>> > > >>>>> default.
>> > > >>>>> This will either be active by default or we would encourage
>> > everyone
>> > > >>>>> to use
>> > > >>>>> this by default, because otherwise it is super hard to reason
>> about
>> > > the
>> > > >>>>> RocksDB footprint.
>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
>> than
>> > > half
>> > > >>>>> of
>> > > >>>>> the managed memory from 1.9.
>> > > >>>>>   - I am not sure if the managed memory fraction is a value that
>> > all
>> > > >>>>> users
>> > > >>>>> adjust immediately when scaling up the memory during their first
>> > > >>>>> try-out
>> > > >>>>> phase. I would assume that most users initially only adjust
>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
>> will
>> > > lead
>> > > >>>>> to
>> > > >>>>> having too large heaps and very little RocksDB / batch memory
>> even
>> > > when
>> > > >>>>> scaling up during the initial exploration.
>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>> > > >>>>> benchmarks.
>> > > >>>>> So maybe keeping it at 0.4 could work?
>> > > >>>>>
>> > > >>>>> And one question: Why do we set the Framework Heap by default?
>> Is
>> > > that
>> > > >>>>> so
>> > > >>>>> we reduce the managed memory further is less than framework heap
>> > > would
>> > > >>>>> be
>> > > >>>>> left from the JVM heap?
>> > > >>>>>
>> > > >>>>> Best,
>> > > >>>>> Stephan
>> > > >>>>>
>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
>> > [email protected]>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>> > Hi all,
>> > > >>>>> >
>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
>> default
>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
>> > > >>>>> >
>> > > >>>>> > After spending time analyzing and tuning the configurations,
>> I've
>> > > >>>>> come
>> > > >>>>> > with several findings. To be brief, I would suggest the
>> following
>> > > >>>>> changes,
>> > > >>>>> > and for more details please take a look at my tuning report
>> [2].
>> > > >>>>> >
>> > > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>> > > >>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
>> > > >>>>> >
>> > > >>>>> > Looking forward to your feedback.
>> > > >>>>> >
>> > > >>>>> > Thank you~
>> > > >>>>> >
>> > > >>>>> > Xintong Song
>> > > >>>>> >
>> > > >>>>> >
>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>> > > >>>>> >
>> > > >>>>> > [2]
>> > > >>>>> >
>> > > >>>>>
>> > >
>> >
>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>> > > >>>>> >
>> > > >>>>> >
>> > > >>>>>
>> > > >>>>
>> > >
>> >
>>
>

Re: [Discuss] Tuning FLIP-49 configuration default values.

Reply via email to