yihua commented on issue #6940: URL: https://github.com/apache/hudi/issues/6940#issuecomment-1363650663
@matthiasdg Thanks for raising this performance issue. We've put a few performance fixes on the latest master recently to address the performance issue in the Hudi file index: (1) Avoid file index and use fs view cache in COW input format #7493 cherry-picked for 0.12.2 release (2) Turn off metadata-table-based file listing in BaseHoodieTableFileIndex #7488 cherry-picked for 0.12.2 release (3) Lazy fetching partition path & file slice for HoodieFileIndex #6680 targeted for 0.13.0 release (4) Fixing FileIndex impls to properly batch partitions listing #7233 targeted for 0.13.0 release @konradwudkowski we've verified that with 0.12.2 RC1 containing the first two fixes, queries using Trino Hive connector should now be par with old releases (more than 10x faster than 0.12.1). @matthiasdg These fixes should also fix the slowness of file listing for the queries in Spark. A few community users have already verified that with the master branch. I'm going to close this issue now. @konradwudkowski @matthiasdg if you still observe the same performance problem, feel free to reopen this Github issue. We'll triage it again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
