[jira] [Updated] (HUDI-5289) WriteStatus RDD is recalculated in cluster

zouxxyy (Jira) Mon, 28 Nov 2022 19:09:04 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zouxxyy updated HUDI-5289:
--------------------------
    Description: 
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages about the job, they are all related to the 
calculation of WriteStatus, but some tasks in stage96 have been recalculated 
which taking more than ten minutes

!image-2022-11-29-10-24-08-853.png|width=1560,height=57!

here is stage 65

!image-2022-11-29-10-25-29-546.png|width=640,height=515!

here is stage 96

!image-2022-11-29-10-26-22-050.png|width=643,height=435!

  was:
Step:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
--conf spark.driver.memory=40G \
--conf spark.executor.instances=20 \
--conf spark.executor.memory=40G \
--conf spark.executor.cores=4 \
hudi-utilities-bundle_2.11-0.12.0.jar \
--props clusteringjob.properties \
--mode scheduleAndExecute \
--base-path xxx \
--table-name xxx \
--spark-memory 40g {code}
The following are the two stages of spark calculation above job, they are all 
related to the calculation of WriteStatus, stage96 is recalculated

!image-2022-11-29-10-24-08-853.png|width=1560,height=57!

here is stage 65

!image-2022-11-29-10-25-29-546.png|width=640,height=515!

here is stage 96

!image-2022-11-29-10-26-22-050.png|width=643,height=435!


> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
>                 Key: HUDI-5289
>                 URL: https://issues.apache.org/jira/browse/HUDI-5289
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark
>            Reporter: zouxxyy
>            Priority: Major
>         Attachments: image-2022-11-29-10-24-08-853.png, 
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages about the job, they are all related to the 
> calculation of WriteStatus, but some tasks in stage96 have been recalculated 
> which taking more than ten minutes
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5289) WriteStatus RDD is recalculated in cluster

Reply via email to