[jira] [Created] (HUDI-4812) Delay file groups fetching after partition prune in Spark Query

Yuwei Xiao (Jira) Thu, 08 Sep 2022 00:28:05 -0700

Yuwei Xiao created HUDI-4812:
--------------------------------

             Summary: Delay file groups fetching after partition prune in Spark 
Query
                 Key: HUDI-4812
                 URL: https://issues.apache.org/jira/browse/HUDI-4812
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Yuwei Xiao



In current spark query implementation, the FileIndex will refresh and load all 
file groups in cached in order to serve subsequent queries.

 

For large table with many partitions, this may introduce much overhead in 
initialization. Meanwhile, the query itself may come with partition filter. So 
the loading of file groups will be unnecessary.

 

So to optimize, the whole refresh logic will become lazy, where actual work 
will be carried out only after the partition filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4812) Delay file groups fetching after partition prune in Spark Query

Reply via email to