[
https://issues.apache.org/jira/browse/FLINK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992451#comment-16992451
]
Xintong Song edited comment on FLINK-15103 at 12/10/19 11:47 AM:
-----------------------------------------------------------------
[~pnowojski]
Totally agree on the suggested documentation / release note changes. We are
currently collecting changes and potential affects on different scenarios that
should be mentioned, by tuning the configurations with various jobs. I think
this benchmark problem could be a very good case. It probably makes sense for
us to draft the release notes / tuning guides after double checking the
scenarios and having better understanding of what could be affected.
And there's one more thing I'd like to get consensus on for this particular
issue, before moving on to implementation.
The number of network buffers is related to JVM max heap size, which might
change on different machines. That means while there are 12945 buffers on my
laptop, it could be different when running on [Flink Speed
Center|http://codespeed.dak8s.net:8000].
In order to configure the benchmarks to always have the same amount of network
buffers as before, we need to read the JVM max heap size, and derive the
network memory size with the legacy configuration logics. This could be done in
the setup stage (not included for statistics), but the corresponding codes are
already removed from Flink and we need to port those codes into flink-benchmark
(probably as a util method). If we decide to configure the network buffers to
some smaller value later after the release, we also need to remove these codes.
An alternative is to simply check the default JVM heap size on [Flink Speed
Center|http://codespeed.dak8s.net:8000], and configure number of network
buffers to match that heap size. This should easily give use the same
performance on the particular site, but will not work the same way for
developers / contributors who runs the benchmark locally.
I'm personally in favor of porting the legacy codes to flink-benchmark. WDYT?
was (Author: xintongsong):
[~pnowojski]
Totally agree on the suggested documentation / release note changes. We are
currently collecting changes and potential affects on different scenarios that
should be mentioned, by tuning the configurations with various jobs. I think
this benchmark problem could be a very good case. It probably makes sense for
us to draft the release notes / tuning guides after double checking the
scenarios and having better understanding of what could be affected.
And there's one more thing I'd like to get consensus for this particular issue.
The number of network buffers is related to JVM max heap size, which might
change on different machines. That means while there are 12945 buffers on my
laptop, it could be different when running on [Flink Speed
Center|http://codespeed.dak8s.net:8000].
In order to configure the benchmarks to always have the same amount of network
buffers as before, we need to read the JVM max heap size, and derive the
network memory size with the legacy configuration logics. This could be done in
the setup stage (not included for statistics), but the corresponding codes are
already removed from Flink and we need to port those codes into flink-benchmark
(probably as a util method). If we decide to configure the network buffers to
some smaller value later after the release, we also need to remove these codes.
An alternative is to simply check the default JVM heap size on [Flink Speed
Center|http://codespeed.dak8s.net:8000], and configure number of network
buffers to match that heap size. This should easily give use the same
performance on the particular site, but will not work the same way for
developers / contributors who runs the benchmark locally.
I'm personally in favor of porting the legacy codes to flink-benchmark. WDYT?
> Performance regression on 3.12.2019 in various benchmarks
> ---------------------------------------------------------
>
> Key: FLINK-15103
> URL: https://issues.apache.org/jira/browse/FLINK-15103
> Project: Flink
> Issue Type: Bug
> Components: Benchmarks
> Reporter: Piotr Nowojski
> Priority: Blocker
> Fix For: 1.10.0
>
>
> Various benchmarks show a performance regression that happened on December
> 3rd:
> [arrayKeyBy (probably the most easily
> visible)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=arrayKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>
> [tupleKeyBy|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>
> [twoInputMapSink|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=twoInputMapSink&env=2&revs=200&equid=off&quarts=on&extr=on]
> [globalWindow (small
> one)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=globalWindow&env=2&revs=200&equid=off&quarts=on&extr=on]
> and possible others.
> Probably somewhere between those commits: -8403fd4- 2d67ee0..60b3f2f
--
This message was sent by Atlassian Jira
(v8.3.4#803005)