[GitHub] [hudi] guanziyue commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet

GitBox Wed, 14 Dec 2022 22:31:26 -0800


guanziyue commented on PR #6612:
URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352617591


   > > @loukey-lj : can you respond to @guanziyue 's comment above. I will 
review this patch by this week.
   > 
   > Yes, this optimization is applicable to other frameworks. For hudi, its 
advantage is that it can get rowgroups and store them in the index while 
updating the index. For schema evolution, we currently only support adding 
fields. Different rowgroups in the Parquet file can have different schmeas, but 
this is unknown to the query side. If schema changes are not considered, I can 
submit a small demo
   
   Thanks for your reply. Agree that this idea can improve performance a lot 
theoretically. It worries me that current parquet implementation or interface 
cannot fully support this idea. Looking forward to this RFC!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] guanziyue commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet

Reply via email to