[GitHub] [hudi] nsivabalan commented on issue #6811: [SUPPORT] Slow upsert performance

GitBox Sun, 06 Nov 2022 22:44:23 -0800


nsivabalan commented on issue #6811:
URL: https://github.com/apache/hudi/issues/6811#issuecomment-1305153782


   Few pointers:
   - for failed executors, do check if they are failing due to OOMs. if yes, 
you may need to tune your spark memory configs. 
   - I see you have set cleaner.commits.retained = 1. So, this would clean the 
data files aggressively which could add to your write latency. you can consider 
relaxing this a bit. Also, you can relax 
https://hudi.apache.org/docs/configurations/#hoodiecleanmaxcommits if you plan 
to relax the cleaner commits retained. 
   - If you have lot of small files in your lake, it might impact your index 
look up time. May be you can enable clustering to batch lot of small files into 
larger ones. you can choose to do this async as well. 
   - If none of the above suggestion works, one another option is to consider 
BUCKET_INDEX. but that would mean you have to start w/ a new table altogether. 
but your index look up is expected to be O(1). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #6811: [SUPPORT] Slow upsert performance

Reply via email to