ad1happy2go commented on issue #10456:
URL: https://github.com/apache/hudi/issues/10456#issuecomment-1918948729

   @xicm @danny0405 Had a discussion with @maheshguptags . Let me try to 
summarise his issue.
   
   He is having around 5000 partitions in total and using the bucket index. 
When he use parallelism(write.tasks) as 20 the job takes 1:45 mins and when it 
is 100 it takes 35 mins.
   
   But with increase in parallelism, the number of file groups explodes as 
expected. This result in lot of small file groups with very few records each 
(~20) , which ultimately causing OOM due to 400MB commit files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to