[GitHub] [hudi] yihua commented on issue #5481: [SUPPORT] Slow Upsert When Reloading Data into Hudi Table

GitBox Mon, 02 May 2022 13:27:26 -0700


yihua commented on issue #5481:
URL: https://github.com/apache/hudi/issues/5481#issuecomment-1115332685


   @MikeBuh Thanks for providing the detailed configs and Spark UI screenshots. 
 Based on the information you provided, it looks like each Spark executor takes 
more input data than it can handle well in memory.  You may try to use the 
following Spark and Hudi configs and retry the real-time version of the job 
(assuming 7GB per upsert batch):
   ```
   hoodie.upsert.shuffle.parallelism=70
   conf spark.sql.shuffle.partitions: 70
   conf spark.default.parallelism: 70
   ```
   If you still hit the memory bottleneck, you may increase both the driver and 
executor memory:
   ```
   driver-memory: 12g
   executor-memory: 8g
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on issue #5481: [SUPPORT] Slow Upsert When Reloading Data into Hudi Table

Reply via email to