Yuwei Xiao created HUDI-4812:
--------------------------------
Summary: Delay file groups fetching after partition prune in Spark
Query
Key: HUDI-4812
URL: https://issues.apache.org/jira/browse/HUDI-4812
Project: Apache Hudi
Issue Type: Improvement
Reporter: Yuwei Xiao
In current spark query implementation, the FileIndex will refresh and load all
file groups in cached in order to serve subsequent queries.
For large table with many partitions, this may introduce much overhead in
initialization. Meanwhile, the query itself may come with partition filter. So
the loading of file groups will be unnecessary.
So to optimize, the whole refresh logic will become lazy, where actual work
will be carried out only after the partition filter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)