[ https://issues.apache.org/jira/browse/FLINK-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KevinyhZou updated FLINK-25335: ------------------------------- Summary: Improvement of task deployment by enable source split asynchronous enumerate (was: Improvement of task deployment by enable source split Asynchronous enumerate) > Improvement of task deployment by enable source split asynchronous enumerate > ---------------------------------------------------------------------------- > > Key: FLINK-25335 > URL: https://issues.apache.org/jira/browse/FLINK-25335 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Affects Versions: 1.12.1 > Reporter: KevinyhZou > Priority: Major > Attachments: image-2021-12-16-11-14-36-030.png > > > When submit olap query by flink client to Flink Session Cluster, the > JobMaster will start scheduling and enumerate the hive source split by > `HiveSourceFileEnumerator`, and then deploy the query task and execute it. if > the source table has a lot of partition and the partition file is big, the > source split enumerate will cost a lot of time, which would block the task > deployment & execution for a long time, and the dashboard can not appear > !image-2021-12-16-11-14-36-030.png! > it would be better to Asynchronous enumerate the hive split, and meanwhile > deploy the query task and execute it. when the deployment is finished, source > operator fetch split and read data, and the split enumeration is also going > on. -- This message was sent by Atlassian Jira (v8.20.1#820001)