[
https://issues.apache.org/jira/browse/HUDI-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davis Zhang reassigned HUDI-9551:
---------------------------------
Assignee: Davis Zhang
> Secondary index lookup in table version 9 is not prefix lookup
> --------------------------------------------------------------
>
> Key: HUDI-9551
> URL: https://issues.apache.org/jira/browse/HUDI-9551
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Davis Zhang
> Assignee: Davis Zhang
> Priority: Major
>
> https://issues.apache.org/jira/browse/HUDI-9505
> New SI semantics please refer the above Jira and PR description.
>
> When doing the lookup, we have
> * keyToLookup: which is the raw value passed down by caller. The value is
> not escaped and can be null
> * hfile
> the flow is
> escape keyToLookup -> sort -> hfile lookup
>
> the hfile lookup involves some key matching rules, which as of today include:
> * Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped)
> * prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped)
>
> For the new index lookup, we don't fall into either of the bucket, as what we
> are doing is
> extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped)
> This is SI specific logic and we should not use plain prefix lookup as the
> behavior is not the same and can cause correctness issue. We should extra a
> lambda function for key matching.
>
> hfile lookup involves 2 stages:
> given keyToLookupEscaped, need to locate the data block, this requires key
> order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile).
> Now it will become
> {code:java}
> compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile)
> {code}
> - once data blocks are located, we do sequential scan to find the exact key,
> previously it is
> RawKeyFromHfile.equals(keyToLookupEscaped) or
> RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)