[
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zouxxyy updated HUDI-5289:
--------------------------
Description:
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages about the job, they are all related to the
calculation of WriteStatus, but some tasks in stage96 have been recalculated
which taking more than ten minutes
!image-2022-11-29-10-24-08-853.png|width=1560,height=57!
here is stage 65
!image-2022-11-29-10-25-29-546.png|width=640,height=515!
here is stage 96
!image-2022-11-29-10-26-22-050.png|width=643,height=435!
was:
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages of spark calculation above job, they are all
related to the calculation of WriteStatus, stage96 is recalculated
!image-2022-11-29-10-24-08-853.png|width=1560,height=57!
here is stage 65
!image-2022-11-29-10-25-29-546.png|width=640,height=515!
here is stage 96
!image-2022-11-29-10-26-22-050.png|width=643,height=435!
> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
> Key: HUDI-5289
> URL: https://issues.apache.org/jira/browse/HUDI-5289
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark
> Reporter: zouxxyy
> Priority: Major
> Attachments: image-2022-11-29-10-24-08-853.png,
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages about the job, they are all related to the
> calculation of WriteStatus, but some tasks in stage96 have been recalculated
> which taking more than ten minutes
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)