sumosha commented on issue #12412: URL: https://github.com/apache/hudi/issues/12412#issuecomment-2518205611
@ad1happy2go It does appear there is spill even in the faster commit (which explains why that job seems to stay consistent at around 15 minutes). <img width="1692" alt="faster_commit_spill_executors" src="https://github.com/user-attachments/assets/c72dfa84-f8e7-4f25-888d-5b36690086e6"> I haven't been able to recreate the disk spill in my stress testing, so I assume it is the size difference in the underlying table and files being written (production is around 100GB now, I started with a fresh table in testing and haven't built up a good size yet). I was planning to play around with this setting mentioned in your guide: `hoodie.memory.merge.fraction`. Does this seem the right track? I'm wondering if just a larger instance size is warranted as this grows (maybe fewer instances to get a comparable core count). We are currently on the default collector in EMR (Parallel) in production. I have updated to the G1 (this is jdk 17) in stress testing, though I didn't see much change in the overall commit times. We'll move forward with the G1 since it's recommended anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
