yihua commented on issue #6940:
URL: https://github.com/apache/hudi/issues/6940#issuecomment-1363650663

   @matthiasdg Thanks for raising this performance issue.  We've put a few 
performance fixes on the latest master recently to address the performance 
issue in the Hudi file index:
   (1) Avoid file index and use fs view cache in COW input format #7493 
cherry-picked for 0.12.2 release
   (2) Turn off metadata-table-based file listing in BaseHoodieTableFileIndex 
#7488 cherry-picked for 0.12.2 release
   (3) Lazy fetching partition path & file slice for HoodieFileIndex #6680 
targeted for 0.13.0 release
   (4) Fixing FileIndex impls to properly batch partitions listing #7233 
targeted for 0.13.0 release
   
   @konradwudkowski we've verified that with 0.12.2 RC1 containing the first 
two fixes, queries using Trino Hive connector should now be par with old 
releases (more than 10x faster than 0.12.1).
   
   @matthiasdg These fixes should also fix the slowness of file listing for the 
queries in Spark.  A few community users have already verified that with the 
master branch.
   
   I'm going to close this issue now.  @konradwudkowski @matthiasdg if you 
still observe the same performance problem, feel free to reopen this Github 
issue.  We'll triage it again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to