Zhilong Hong created FLINK-24300:
------------------------------------

             Summary: MultipleInputOperator is running much more slowly in TPCDS
                 Key: FLINK-24300
                 URL: https://issues.apache.org/jira/browse/FLINK-24300
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.14.0, 1.15.0
            Reporter: Zhilong Hong
         Attachments: 64570e4c56955713ca599fd1d7ae7be891a314c6.png, 
detail-of-the-job.png, e3010c16947ed8da2ecb7d89a3aa08dacecc524a.png, jstack.txt

When we are running TPCDS with release 1.14 we find that the job with 
MultipleInputOperator is running much more slowly than before. With a binary 
search among the commits, we find that the issue may be introduced by 
FLINK-23408. 

At the commit 64570e4c56955713ca599fd1d7ae7be891a314c6, the job runs normally 
in TPCDS, as the image below illustrates:

!64570e4c56955713ca599fd1d7ae7be891a314c6.png|width=600!

At the commit e3010c16947ed8da2ecb7d89a3aa08dacecc524a, the job q2.sql gets 
stuck for a pretty long time (longer than half an hour), as the image below 
illustrates:

!e3010c16947ed8da2ecb7d89a3aa08dacecc524a.png|width=600!

The detail of the job is illustrated below:

!detail-of-the-job.png|width=600!

The job uses a {{MultipleInputOperator}} with one normal input and two chained 
FileSource. It has finished reading the normal input and start to read the 
chained source. Each chained source has one source data fetcher.

We capture the jstack of the stuck tasks and attach the file below. From the 
[^jstack.txt] we can see the main thread is blocked on waiting for the lock, 
and the lock is held by a source data fetcher. The source data fetcher is still 
running but the stack keeps on {{CompletableFuture.cleanStack}}.

This issue happens in a batch job. However, from where it get blocked, it seems 
also affects the streaming jobs.

For the reference, the code of TPCDS we are running is located at 
[https://github.com/ververica/flink-sql-benchmark/tree/dev].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to