[
https://issues.apache.org/jira/browse/HUDI-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-3776:
----------------------------------
Description:
Currently, BloomIndex tries to rely solely on Column Stats to lookup records
locations. This is however incorrect, since CS state might not be complete at
any given moment; instead we should use it on the basis of best effort (not
assuming that it would have any record at all), and for those files that are
not found in ColStats we should list from them directly.
You can search in code for "HUDI-3776" to see exact code location this is
related to
was:
Currently, BloomIndex tries to rely solely on Column Stats to lookup records
locations. This is however incorrect, since CS state might not be complete at
any given moment; instead we should use it on the basis of best effort (not
assuming that it would have any record at all), and for those files that are
not found in ColStats we should list from them directly.
> Fix BloomIndex incorrectly using ColStats to lookup records locations
> ---------------------------------------------------------------------
>
> Key: HUDI-3776
> URL: https://issues.apache.org/jira/browse/HUDI-3776
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Alexey Kudinkin
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Currently, BloomIndex tries to rely solely on Column Stats to lookup records
> locations. This is however incorrect, since CS state might not be complete at
> any given moment; instead we should use it on the basis of best effort (not
> assuming that it would have any record at all), and for those files that are
> not found in ColStats we should list from them directly.
> You can search in code for "HUDI-3776" to see exact code location this is
> related to
--
This message was sent by Atlassian Jira
(v8.20.1#820001)