[ 
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-5289:
--------------------------
    Description: 
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages of spark calculation above job, they are all 
related to the calculation of WriteStatus, stage96 is recalculated

!image-2022-11-29-10-24-08-853.png|width=1560,height=57!

here is stage 65

!image-2022-11-29-10-25-29-546.png|width=640,height=515!

here is stage 96

!image-2022-11-29-10-26-22-050.png|width=643,height=435!

  was:
Step:

 
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
 

The following are the two stages of spark calculation above job, they are all 
related to the calculation of WriteStatus, stage96 is recalculated

 

 

!image-2022-11-29-10-24-08-853.png|width=1560,height=57!

here is stage 65

!image-2022-11-29-10-25-29-546.png|width=640,height=515!

here is stage 96

!image-2022-11-29-10-26-22-050.png|width=643,height=435!


> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
>                 Key: HUDI-5289
>                 URL: https://issues.apache.org/jira/browse/HUDI-5289
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark
>            Reporter: zouxxyy
>            Priority: Major
>         Attachments: image-2022-11-29-10-24-08-853.png, 
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages of spark calculation above job, they are all 
> related to the calculation of WriteStatus, stage96 is recalculated
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to