[
https://issues.apache.org/jira/browse/FLINK-25704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481531#comment-17481531
]
Yingjie Cao commented on FLINK-25704:
-------------------------------------
After some analysis, I found that these tests is forĀ
BoundedBlockingResultPartition, but they does not choose to use which blocking
shuffle explicitly, that is, they are using the default implementation and
configuration. FLINK-25636 changed the default blocking shuffle implementation,
so the tests are now testingĀ SortMergeResultPartition instead of the
BoundedBlockingResultPartition. There is should be no regression for
BoundedBlockingResultPartition.
The reason that BoundedBlockingResultPartition is better than
SortMergeResultPartition is that SortMergeResultPartition Introduce some record
copy overhead. For big records, this overhead is not a big problem (no
regression on TPC-DS), for small records, the copy overhead is non-negligible
and these tests are using small records (8bytes long value). I created a new
ticket FLINK-25796 to improve this scenario.
As a summary, the followup actions are as follows:
# Make the existing benchmark tests still test BoundedBlockingResultPartition;
# Add new benchmark tests for SortMergeResultPartition;
# Try to optimize SortMergeResultPartition for small records;
# Document this default blocking shuffle change in both release notes and user
doc.
Any suggestions?
> Performance regression on 18.01.2022 in batch network benchmarks
> ----------------------------------------------------------------
>
> Key: FLINK-25704
> URL: https://issues.apache.org/jira/browse/FLINK-25704
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Affects Versions: 1.15.0
> Reporter: Piotr Nowojski
> Priority: Critical
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3&ben=compressedFilePartition&env=2&revs=200&equid=off&quarts=on&extr=on
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3&ben=uncompressedFilePartition&env=2&revs=200&equid=off&quarts=on&extr=on
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3&ben=uncompressedMmapPartition&env=2&revs=200&equid=off&quarts=on&extr=on
> Suspected range:
> {code}
> git ls eeec246677..f5c99c6f26
> f5c99c6f26 [5 weeks ago] [FLINK-17321][table] Add support casting of map to
> map and multiset to multiset [Sergey Nuyanzin]
> 745cfec705 [24 hours ago] [hotfix][table-common] Fix InternalDataUtils for
> MapData tests [Timo Walther]
> ed699b6ee6 [6 days ago] [FLINK-25637][network] Make sort-shuffle the default
> shuffle implementation for batch jobs [kevin.cyj]
> 4275525fed [6 days ago] [FLINK-25638][network] Increase the default write
> buffer size of sort-shuffle to 16M [kevin.cyj]
> e1878fb899 [6 days ago] [FLINK-25639][network] Increase the default read
> buffer size of sort-shuffle to 64M [kevin.cyj]
> {code}
> It looks [~kevin.cyj], that most likely your change has caused that?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)