[GitHub] [hudi] ft-bazookanu opened a new issue, #6970: [SUPPORT] Performance of Snapshot Exporter

GitBox Mon, 17 Oct 2022 12:17:03 -0700


ft-bazookanu opened a new issue, #6970:
URL: https://github.com/apache/hudi/issues/6970


   Increasing spark.executor.memory or spark.executor.cores _worsens_ 
performance of HUDI Exporter
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run the HUDI exporter varying spark.executor.instances, 
spark.executor.memory and spark.executor.cores
   
![image](https://user-images.githubusercontent.com/107943394/196262534-60be19aa-b161-4382-a920-fe0886311377.png)
   
   
   **Expected behavior**
   1. Performance should not worsen if we increase spark.executor.memory and 
spark.executor.cores while keeping spark.executor.instances constant.
   
   We also hoped to have better performance in general, on par with `s3 cp`. 
What can we do to improve Exporter's performance?
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.2
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
    - The total size of the exported data is 200GB.
    - The HUDI table has 500 partitions.
    - .hoodie/ has 4000 objects
    - The exporter is running on AWS EMR.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ft-bazookanu opened a new issue, #6970: [SUPPORT] Performance of Snapshot Exporter

Reply via email to