[
https://issues.apache.org/jira/browse/FLINK-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KevinyhZou updated FLINK-25335:
-------------------------------
Summary: Improvement of task deployment by enable source split asynchronous
enumerate (was: Improvement of task deployment by enable source split
Asynchronous enumerate)
> Improvement of task deployment by enable source split asynchronous enumerate
> ----------------------------------------------------------------------------
>
> Key: FLINK-25335
> URL: https://issues.apache.org/jira/browse/FLINK-25335
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.12.1
> Reporter: KevinyhZou
> Priority: Major
> Attachments: image-2021-12-16-11-14-36-030.png
>
>
> When submit olap query by flink client to Flink Session Cluster, the
> JobMaster will start scheduling and enumerate the hive source split by
> `HiveSourceFileEnumerator`, and then deploy the query task and execute it. if
> the source table has a lot of partition and the partition file is big, the
> source split enumerate will cost a lot of time, which would block the task
> deployment & execution for a long time, and the dashboard can not appear
> !image-2021-12-16-11-14-36-030.png!
> it would be better to Asynchronous enumerate the hive split, and meanwhile
> deploy the query task and execute it. when the deployment is finished, source
> operator fetch split and read data, and the split enumeration is also going
> on.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)