ad1happy2go commented on issue #10456: URL: https://github.com/apache/hudi/issues/10456#issuecomment-1918948729
@xicm @danny0405 Had a discussion with @maheshguptags . Let me try to summarise his issue. He is having around 5000 partitions in total and using the bucket index. When he use parallelism(write.tasks) as 20 the job takes 1:45 mins and when it is 100 it takes 35 mins. But with increase in parallelism, the number of file groups explodes as expected. This result in lot of small file groups with very few records each (~20) , which ultimately causing OOM due to 400MB commit files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
