[ 
https://issues.apache.org/jira/browse/FLINK-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KevinyhZou updated FLINK-25335:
-------------------------------
    Summary: Improvement of task deployment by enable source split asynchronous 
enumerate  (was: Improvement of task deployment by enable source split 
Asynchronous enumerate)

> Improvement of task deployment by enable source split asynchronous enumerate
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-25335
>                 URL: https://issues.apache.org/jira/browse/FLINK-25335
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.1
>            Reporter: KevinyhZou
>            Priority: Major
>         Attachments: image-2021-12-16-11-14-36-030.png
>
>
> When submit olap query by flink client to Flink Session Cluster, the 
> JobMaster will start scheduling and  enumerate the hive source split by 
> `HiveSourceFileEnumerator`, and then deploy the query task and execute it. if 
> the source table has a lot of partition and the partition file is big, the 
> source split enumerate will cost a lot of time, which would block the task 
> deployment & execution for a long time, and the dashboard can not appear
> !image-2021-12-16-11-14-36-030.png!
> it would be better to Asynchronous enumerate the hive split, and meanwhile 
> deploy the query task and execute it. when the deployment is finished, source 
> operator fetch split and read data, and the split enumeration is also going 
> on.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to