[
https://issues.apache.org/jira/browse/FLINK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992128#comment-16992128
]
Xintong Song commented on FLINK-15103:
--------------------------------------
True, I was also discussing this issue with [~yunta], who is looking into the
regression with rocksdb state backend. I did not know that that the benchmarks
are using LocalExecutor until I looked into their source codes yesterday.
For LocalExecutor, JVM heap size should not be affected by
taskmanager.memory.off-heap. The operators with regression also not use managed
memory. Then the only thing came to my mind is number / size of network
buffers. Although we do not start separate JVM process for TaskExecutors in
LocalExecutor, the number of network buffers used by NettyShuffleEnvironment is
calculated from the configuration on launching the TaskExecutors.
Another thing to notice is that, at the commit that introduces the regression
(4b8ed643a4d85c9440a8adbc0798b8a4bbd9520b), FLIP-49 is not yet activated and
TaskExecutors are still using the legacy logic to decide memory sizes. When
FLIP-49 is activated in later commit
(9d1256ccbf8eb1556016b6805c3a91e2787d298a), the regression still exist. To
fully understand the problem, we need to analyze the regression in both
configuration logics.
I'm still running tests and looking into the codes, looking for proves that
supports my hypothesis about the network buffer size. Will post updates here if
I find anything.
> Performance regression on 3.12.2019 in various benchmarks
> ---------------------------------------------------------
>
> Key: FLINK-15103
> URL: https://issues.apache.org/jira/browse/FLINK-15103
> Project: Flink
> Issue Type: Bug
> Components: Benchmarks
> Reporter: Piotr Nowojski
> Priority: Blocker
> Fix For: 1.10.0
>
>
> Various benchmarks show a performance regression that happened on December
> 3rd:
> [arrayKeyBy (probably the most easily
> visible)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=arrayKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>
> [tupleKeyBy|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on]
>
> [twoInputMapSink|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=twoInputMapSink&env=2&revs=200&equid=off&quarts=on&extr=on]
> [globalWindow (small
> one)|http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=globalWindow&env=2&revs=200&equid=off&quarts=on&extr=on]
> and possible others.
> Probably somewhere between those commits: -8403fd4- 2d67ee0..60b3f2f
--
This message was sent by Atlassian Jira
(v8.3.4#803005)