[
https://issues.apache.org/jira/browse/FLINK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392978#comment-17392978
]
Anton Kalashnikov commented on FLINK-23560:
-------------------------------------------
[~pnowojski] My conclusion why it happens:
First of all, there is a little java theory: if a synchronization block is used
in only one thread it works more effectively rather than it would be used in
different threads. It happens because when only one thread is able to own the
mutex the information about this owner thread is located inside of the header
of this object but when the several threads want to own the object then the
java create the table with this information in a separate place which requires
extra hops on each synchronization.
One more important notice is the all problematic benchmarks don't have any
timers or checkpoints. So it mostly has only one thread during the execution.
More precisely: *LegacySourceFunctionThread* does main execution(evict the data
under the checkpoint lock) and *MailboxProcessorThread* do nothing until
*LegacySourceFunctionThread* is finished and then *MailboxProcessorThread* just
finishes the task*.*
FLINK-23452 has introduced the action which is submitted to the mailbox for
execution. It is not really important what is this task doing(performance drop
reproduces even for empty action). More important here is that during the
*LegacySourceFunctionThread* does its job(emitting records under checkpoint
lock synchronization)*,* the ** *MailboxProcessorThread* wakes up and executes
the action via *SynchronizedStreamTaskActionExecutor*(synchronized version of
*StreamTaskActionExecutor*) which uses for synchronization the same checkpoint
lock which uses *LegacySourceFunctionThread*. So as soon as
*MailboxProcessorThread* takes the lock for the first time **
*LegacySourceFunctionThread* becomes slower because of more expensive
synchronization which I mention at the beginning.**
I proved the above-described assumption by adding the simple code before the
execution of mainOperator inside of
SourceStreamTask.LegacySourceFunctionThread#run(in fact it can be added to any
place in the code):
{code:java}
new Thread(() -> {
synchronized (lock) {
}
}).start();
{code}
These small changes lead to the same degradation as FLINK-23452
In general, it means that currently, our benchmarks are not so relevant because
in real cases we usually use the checkpoint or timeService which weren't used
for these benchmarks. But also we can turn off the feature from FLINK-23452 for
sources(input gates == 0) because, in fact, the calculation of the throughput
for sources doesn't make sense.
> Performance regression on 29.07.2021
> ------------------------------------
>
> Key: FLINK-23560
> URL: https://issues.apache.org/jira/browse/FLINK-23560
> Project: Flink
> Issue Type: Bug
> Components: Benchmarks
> Affects Versions: 1.14.0
> Reporter: Piotr Nowojski
> Assignee: Anton Kalashnikov
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: Screenshot 2021-07-30 at 15.46.54.png
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=remoteFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedMmapPartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=compressedFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=arrayKeyBy&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow&env=2
> (And potentially other benchmarks)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)