[
https://issues.apache.org/jira/browse/HUDI-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shizhi Chen updated HUDI-5068:
------------------------------
Description:
*CopyOnWriteInputFormat#createInputSplits* is invoked by
*org.apache.flink.runtime.executiongraph.ExecutionJobVertex* in JobManager to
create file input splits synchronously.
It's found that in batch mode this will occupy the largest share of job
submission time.
So in this PR it will be optimized by creating input splits in thread pool
executor asynchronously.
was:
*CopyOnWriteInputFormat#createInputSplits* is invoked by
*org.apache.flink.runtime.executiongraph.ExecutionJobVertex*
in JobManager to create file input splits synchronously. It's found that in
batch mode this will occupy the largest
share of job submission time.
So in this PR it will be optimized by creating input splits in thread pool
executor asynchronously.
> Support cow flink batch create fs input split asynchronously
> ------------------------------------------------------------
>
> Key: HUDI-5068
> URL: https://issues.apache.org/jira/browse/HUDI-5068
> Project: Apache Hudi
> Issue Type: Improvement
> Components: flink-sql, incremental-query, performance
> Reporter: Shizhi Chen
> Assignee: Shizhi Chen
> Priority: Blocker
> Fix For: 0.13.0
>
>
> *CopyOnWriteInputFormat#createInputSplits* is invoked by
> *org.apache.flink.runtime.executiongraph.ExecutionJobVertex* in JobManager to
> create file input splits synchronously.
> It's found that in batch mode this will occupy the largest share of job
> submission time.
> So in this PR it will be optimized by creating input splits in thread pool
> executor asynchronously.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)