[
https://issues.apache.org/jira/browse/HUDI-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-2777:
-----------------------------
Sprint: (was: Cont' improve - 2021/01/18)
> Data import performance deteriorates because multiple Spark jobs are started
> when data is written to disks.
> -----------------------------------------------------------------------------------------------------------
>
> Key: HUDI-2777
> URL: https://issues.apache.org/jira/browse/HUDI-2777
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark
> Affects Versions: 0.9.0
> Environment: hudi 0.9.0
> spark3.1.1
> hive3.1.1
> hadoop3.1.1
> Reporter: liuhe0702
> Assignee: liuhe0702
> Priority: Critical
> Labels: hudi-on-call, pull-request-available, query-eng, sev:high
> Fix For: 0.11.0
>
>
> If multiple partitions exist and the final result of RDD.isEmpty is true,
> Spark starts multiple jobs in 5-fold increment mode. As a result, the
> computing performance deteriorates.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)