ChiehFu commented on issue #10121:
URL: https://github.com/apache/hudi/issues/10121#issuecomment-1815215853

   @ad1happy2go 
   I was able find this table via Hudi CLI, which I think display the similar 
information in hudi commit file
   <img width="1407" alt="image" 
src="https://github.com/apache/hudi/assets/11819388/fb32cd55-3552-4129-b798-4a7963276612";>
   
   The explanation makes sense, and I think tables where this kind of slowness 
is observed are tables either having a single partition, or tables where 
incremental updates touch majority of partitions. 
   
   The part that I couldn't figure out is that we had those tables were 
initially created as Hudi 0.8 table and they had been running incremental 
upsert for quite a while without seeing any sign of slowness in writing stage. 
It was until recently we upgraded them to Hudi 0.12.3 we started seeing 
slowness in writing stage while there is no significant change to the trait of 
incremental updates. 
   
   So I guess below two questions came to my mind:
   - Is data writing stage somehow slower in Hudi 0.12 compared to Hudi 0.8 in 
the case of multiple partitions/file groups are touched?
   - Is there any fundamental difference between Hudi 0.8 and Hudi 0.12 in 
terms of how records get bin-packed into file groups such that write 
amplification for incremental upsert is significantly large in Hudi 0.12 
compared to Hudi 0.8 even though there hasn't been much change to the trait of 
the incremental updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to