[ 
https://issues.apache.org/jira/browse/HUDI-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8463:
------------------------------
    Description: When the snapshot query is planned, there are cases to look up 
completion time based on instant time, which can be a performance bottleneck, 
especially there are huge number of files, and large number of instants to look 
up, in both archived and active timeline.  We should see if this can be 
improved by storing the completion time of each file in the FILES partition in 
the metadata table to avoid expensive lookup every time.  When the completion 
time of each file in the FILES partition is stored in MDT, we only need to do 
filtering based on the information from MDT only.

> Revisit snapshot query planning performance regarding completion time
> ---------------------------------------------------------------------
>
>                 Key: HUDI-8463
>                 URL: https://issues.apache.org/jira/browse/HUDI-8463
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> When the snapshot query is planned, there are cases to look up completion 
> time based on instant time, which can be a performance bottleneck, especially 
> there are huge number of files, and large number of instants to look up, in 
> both archived and active timeline.  We should see if this can be improved by 
> storing the completion time of each file in the FILES partition in the 
> metadata table to avoid expensive lookup every time.  When the completion 
> time of each file in the FILES partition is stored in MDT, we only need to do 
> filtering based on the information from MDT only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to