[jira] [Updated] (FLINK-25335) Improvoment of task deployment by enable source split async enumerate

KevinyhZou (Jira) Wed, 15 Dec 2021 19:18:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


KevinyhZou updated FLINK-25335:
-------------------------------
    Description: 
When submit olap query by flink client to Flink Session Cluster, the JobMaster 
will start scheduling and  enumerate the hive source split by 
`HiveSourceFileEnumerator`, and then deploy the query task and execute it. if 
the source table has a lot of partition and the partition file is big, the 
source split enumerate will cost a lot of time, which would block the task 
deployment & execution for a long time, and the dashboard can not appear

!image-2021-12-16-11-14-36-030.png!

it would be better to Asynchronous enumerate the hive split, and meanwhile 
deploy the query task and execute it. when the deployment is finished, source 
operator fetch split and read data, and the split enumeration is also going on.

  was:
When submit olap query by flink client to Flink Session Cluster, the JobMaster 
will start scheduling and  enumerate the hive source split by 
`HiveSourceFileEnumerator`, and then deploy the query task and execute it. if 
the source
table has a lot of partition and the partition file is big, the source split 
enumerate will cost a lot of time, which would block the task deployment & 
execution for a long time, and the dashboard can not appear

!image-2021-12-16-11-14-36-030.png!

JobMaster should async enumerate the hive split, and meanwhile deploy the query 
task and execute it. when the deployment is finished, source operator fetch 
split and read data, and the split enumeration is also going on.


> Improvoment of task deployment by enable source split async enumerate
> ---------------------------------------------------------------------
>
>                 Key: FLINK-25335
>                 URL: https://issues.apache.org/jira/browse/FLINK-25335
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.1
>            Reporter: KevinyhZou
>            Priority: Major
>         Attachments: image-2021-12-16-11-14-36-030.png
>
>
> When submit olap query by flink client to Flink Session Cluster, the 
> JobMaster will start scheduling and  enumerate the hive source split by 
> `HiveSourceFileEnumerator`, and then deploy the query task and execute it. if 
> the source table has a lot of partition and the partition file is big, the 
> source split enumerate will cost a lot of time, which would block the task 
> deployment & execution for a long time, and the dashboard can not appear
> !image-2021-12-16-11-14-36-030.png!
> it would be better to Asynchronous enumerate the hive split, and meanwhile 
> deploy the query task and execute it. when the deployment is finished, source 
> operator fetch split and read data, and the split enumeration is also going 
> on.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-25335) Improvoment of task deployment by enable source split async enumerate

Reply via email to