[ 
https://issues.apache.org/jira/browse/FLINK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392978#comment-17392978
 ] 

Anton Kalashnikov commented on FLINK-23560:
-------------------------------------------

[~pnowojski] My conclusion why it happens:

First of all, there is a little java theory: if a synchronization block is used 
in only one thread it works more effectively rather than it would be used in 
different threads. It happens because when only one thread is able to own the 
mutex the information about this owner thread is located inside of the header 
of this object but when the several threads want to own the object then the 
java create the table with this information in a separate place which requires 
extra hops on each synchronization.

One more important notice is the all problematic benchmarks don't have any 
timers or checkpoints. So it mostly has only one thread during the execution. 
More precisely: *LegacySourceFunctionThread* does main execution(evict the data 
under the checkpoint lock) and *MailboxProcessorThread* do nothing until 
*LegacySourceFunctionThread* is finished and then *MailboxProcessorThread* just 
finishes the task*.*

FLINK-23452 has introduced the action which is submitted to the mailbox for 
execution. It is not really important what is this task doing(performance drop 
reproduces even for empty action). More important here is that during the 
*LegacySourceFunctionThread* does its job(emitting records under checkpoint 
lock synchronization)*,* the ** *MailboxProcessorThread* wakes up and executes 
the action via *SynchronizedStreamTaskActionExecutor*(synchronized version of 
*StreamTaskActionExecutor*) which uses for synchronization the same checkpoint 
lock which uses *LegacySourceFunctionThread*. So as soon as 
*MailboxProcessorThread* takes the lock for the first time ** 
*LegacySourceFunctionThread* becomes slower because of more expensive 
synchronization which I mention at the beginning.**

I proved the above-described assumption by adding the simple code before the 
execution of mainOperator inside of 
SourceStreamTask.LegacySourceFunctionThread#run(in fact it can be added to any 
place in the code):

 
{code:java}
new Thread(() -> {
   synchronized (lock) {
   }
}).start();
{code}
These small changes lead to the same degradation as FLINK-23452

 

In general, it means that currently, our benchmarks are not so relevant because 
in real cases we usually use the checkpoint or timeService which  weren't used 
for these benchmarks. But also we can turn off the feature from FLINK-23452 for 
sources(input gates == 0) because, in fact, the calculation of the throughput 
for sources doesn't make sense.

 

> Performance regression on 29.07.2021
> ------------------------------------
>
>                 Key: FLINK-23560
>                 URL: https://issues.apache.org/jira/browse/FLINK-23560
>             Project: Flink
>          Issue Type: Bug
>          Components: Benchmarks
>    Affects Versions: 1.14.0
>            Reporter: Piotr Nowojski
>            Assignee: Anton Kalashnikov
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: Screenshot 2021-07-30 at 15.46.54.png
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=remoteFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedMmapPartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=compressedFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=arrayKeyBy&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedFilePartition&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow&env=2
> (And potentially other benchmarks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to