ChiehFu commented on issue #10121: URL: https://github.com/apache/hudi/issues/10121#issuecomment-1815215853
@ad1happy2go I was able find this table via Hudi CLI, which I think display the similar information in hudi commit file <img width="1407" alt="image" src="https://github.com/apache/hudi/assets/11819388/fb32cd55-3552-4129-b798-4a7963276612"> The explanation makes sense, and I think tables where this kind of slowness is observed are tables either having a single partition, or tables where incremental updates touch majority of partitions. The part that I couldn't figure out is that we had those tables were initially created as Hudi 0.8 table and they had been running incremental upsert for quite a while without seeing any sign of slowness in writing stage. It was until recently we upgraded them to Hudi 0.12.3 we started seeing slowness in writing stage while there is no significant change to the trait of incremental updates. So I guess below two questions came to my mind: - Is data writing stage somehow slower in Hudi 0.12 compared to Hudi 0.8 in the case of multiple partitions/file groups are touched? - Is there any fundamental difference between Hudi 0.8 and Hudi 0.12 in terms of how records get bin-packed into file groups such that write amplification for incremental upsert is significantly large in Hudi 0.12 compared to Hudi 0.8 even though there hasn't been much change to the trait of the incremental updates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
