[
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zouxxyy updated HUDI-5289:
--------------------------
Description:
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages of spark calculation above job, they are all
related to the calculation of WriteStatus, stage96 is recalculated
!image-2022-11-29-10-24-08-853.png|width=1560,height=57!
here is stage 65
!image-2022-11-29-10-25-29-546.png|width=640,height=515!
here is stage 96
!image-2022-11-29-10-26-22-050.png|width=643,height=435!
was:
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages of spark calculation above job, they are all
related to the calculation of WriteStatus, stage96 is recalculated
!image-2022-11-29-10-24-08-853.png|width=1560,height=57!
here is stage 65
!image-2022-11-29-10-25-29-546.png|width=640,height=515!
here is stage 96
!image-2022-11-29-10-26-22-050.png|width=643,height=435!
> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
> Key: HUDI-5289
> URL: https://issues.apache.org/jira/browse/HUDI-5289
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark
> Reporter: zouxxyy
> Priority: Major
> Attachments: image-2022-11-29-10-24-08-853.png,
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages of spark calculation above job, they are all
> related to the calculation of WriteStatus, stage96 is recalculated
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)