[
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-5289:
---------------------------------
Labels: pull-request-available (was: )
> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
> Key: HUDI-5289
> URL: https://issues.apache.org/jira/browse/HUDI-5289
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark
> Reporter: zouxxyy
> Assignee: zouxxyy
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-11-29-10-24-08-853.png,
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages about the job, they are all related to the
> calculation of WriteStatus, but some tasks in stage96 have been recalculated
> which taking more than ten minutes
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)