[jira] [Comment Edited] (FLINK-15103) Performance regression on 3.12.2019 in various benchmarks

Xintong Song (Jira) Tue, 10 Dec 2019 03:48:30 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992451#comment-16992451
 ]


Xintong Song edited comment on FLINK-15103 at 12/10/19 11:47 AM:
-----------------------------------------------------------------

[~pnowojski]

Totally agree on the suggested documentation / release note changes. We are 
currently collecting changes and potential affects on different scenarios that 
should be mentioned, by tuning the configurations with various jobs. I think 
this benchmark problem could be a very good case. It probably makes sense for 
us to draft the release notes / tuning guides after double checking the 
scenarios and having better understanding of what could be affected.

And there's one more thing I'd like to get consensus on for this particular 
issue, before moving on to implementation.

The number of network buffers is related to JVM max heap size, which might 
change on different machines. That means while there are 12945 buffers on my 
laptop, it could be different when running on [Flink Speed 
Center|http://codespeed.dak8s.net:8000].

In order to configure the benchmarks to always have the same amount of network 
buffers as before, we need to read the JVM max heap size, and derive the 
network memory size with the legacy configuration logics. This could be done in 
the setup stage (not included for statistics), but the corresponding codes are 
already removed from Flink and we need to port those codes into flink-benchmark 
(probably as a util method). If we decide to configure the network buffers to 
some smaller value later after the release, we also need to remove these codes.

An alternative is to simply check the default JVM heap size on [Flink Speed 
Center|http://codespeed.dak8s.net:8000], and configure number of network 
buffers to match that heap size. This should easily give use the same 
performance on the particular site, but will not work the same way for 
developers / contributors who runs the benchmark locally.

I'm personally in favor of porting the legacy codes to flink-benchmark. WDYT?


was (Author: xintongsong):
[~pnowojski]

Totally agree on the suggested documentation / release note changes. We are 
currently collecting changes and potential affects on different scenarios that 
should be mentioned, by tuning the configurations with various jobs. I think 
this benchmark problem could be a very good case. It probably makes sense for 
us to draft the release notes / tuning guides after double checking the 
scenarios and having better understanding of what could be affected.

And there's one more thing I'd like to get consensus for this particular issue.

The number of network buffers is related to JVM max heap size, which might 
change on different machines. That means while there are 12945 buffers on my 
laptop, it could be different when running on [Flink Speed 
Center|http://codespeed.dak8s.net:8000].

In order to configure the benchmarks to always have the same amount of network 
buffers as before, we need to read the JVM max heap size, and derive the 
network memory size with the legacy configuration logics. This could be done in 
the setup stage (not included for statistics), but the corresponding codes are 
already removed from Flink and we need to port those codes into flink-benchmark 
(probably as a util method). If we decide to configure the network buffers to 
some smaller value later after the release, we also need to remove these codes.

An alternative is to simply check the default JVM heap size on [Flink Speed 
Center|http://codespeed.dak8s.net:8000], and configure number of network 
buffers to match that heap size. This should easily give use the same 
performance on the particular site, but will not work the same way for 
developers / contributors who runs the benchmark locally.

I'm personally in favor of porting the legacy codes to flink-benchmark. WDYT?

> Performance regression on 3.12.2019 in various benchmarks
> ---------------------------------------------------------
>
>                 Key: FLINK-15103
>                 URL: https://issues.apache.org/jira/browse/FLINK-15103
>             Project: Flink
>          Issue Type: Bug
>          Components: Benchmarks
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> Various benchmarks show a performance regression that happened on December 
> 3rd:
> [arrayKeyBy (probably the most easily 
> visible)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=arrayKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>  
> [tupleKeyBy|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>  
> [twoInputMapSink|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=twoInputMapSink&env=2&revs=200&equid=off&quarts=on&extr=on]
>  [globalWindow (small 
> one)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=globalWindow&env=2&revs=200&equid=off&quarts=on&extr=on]
>  and possible others.
> Probably somewhere between those commits: -8403fd4- 2d67ee0..60b3f2f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-15103) Performance regression on 3.12.2019 in various benchmarks

Reply via email to