guanziyue commented on PR #6612: URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352617591
> > @loukey-lj : can you respond to @guanziyue 's comment above. I will review this patch by this week. > > Yes, this optimization is applicable to other frameworks. For hudi, its advantage is that it can get rowgroups and store them in the index while updating the index. For schema evolution, we currently only support adding fields. Different rowgroups in the Parquet file can have different schmeas, but this is unknown to the query side. If schema changes are not considered, I can submit a small demo Thanks for your reply. Agree that this idea can improve performance a lot theoretically. It worries me that current parquet implementation or interface cannot fully support this idea. Looking forward to this RFC! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
