[
https://issues.apache.org/jira/browse/HUDI-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8463:
------------------------------
Description: When the snapshot query is planned, there are cases to look up
completion time based on instant time, which can be a performance bottleneck,
especially there are huge number of files, and large number of instants to look
up, in both archived and active timeline. We should see if this can be
improved by storing the completion time of each file in the FILES partition in
the metadata table to avoid expensive lookup every time. When the completion
time of each file in the FILES partition is stored in MDT, we only need to do
filtering based on the information from MDT only.
> Revisit snapshot query planning performance regarding completion time
> ---------------------------------------------------------------------
>
> Key: HUDI-8463
> URL: https://issues.apache.org/jira/browse/HUDI-8463
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.0
>
>
> When the snapshot query is planned, there are cases to look up completion
> time based on instant time, which can be a performance bottleneck, especially
> there are huge number of files, and large number of instants to look up, in
> both archived and active timeline. We should see if this can be improved by
> storing the completion time of each file in the FILES partition in the
> metadata table to avoid expensive lookup every time. When the completion
> time of each file in the FILES partition is stored in MDT, we only need to do
> filtering based on the information from MDT only.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)