ft-bazookanu opened a new issue, #6970: URL: https://github.com/apache/hudi/issues/6970
Increasing spark.executor.memory or spark.executor.cores _worsens_ performance of HUDI Exporter **To Reproduce** Steps to reproduce the behavior: 1. Run the HUDI exporter varying spark.executor.instances, spark.executor.memory and spark.executor.cores  **Expected behavior** 1. Performance should not worsen if we increase spark.executor.memory and spark.executor.cores while keeping spark.executor.instances constant. We also hoped to have better performance in general, on par with `s3 cp`. What can we do to improve Exporter's performance? **Environment Description** * Hudi version : 0.10.1 * Spark version : 3.1.2 * Hive version : 3.1.3 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : yes **Additional context** - The total size of the exported data is 200GB. - The HUDI table has 500 partitions. - .hoodie/ has 4000 objects - The exporter is running on AWS EMR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
