[ 
https://issues.apache.org/jira/browse/HUDI-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang reassigned HUDI-9551:
---------------------------------

    Assignee: Davis Zhang

> Secondary index lookup in table version 9 is not prefix lookup
> --------------------------------------------------------------
>
>                 Key: HUDI-9551
>                 URL: https://issues.apache.org/jira/browse/HUDI-9551
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Davis Zhang
>            Assignee: Davis Zhang
>            Priority: Major
>
> https://issues.apache.org/jira/browse/HUDI-9505
> New SI semantics please refer the above Jira and PR description.
>  
> When doing the lookup, we have 
>  * keyToLookup: which is the raw value passed down by caller. The value is 
> not escaped and can be null
>  * hfile
> the flow is 
> escape keyToLookup -> sort -> hfile lookup
>  
> the hfile lookup involves some key matching rules, which as of today include:
>  * Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped)
>  * prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped)
>  
> For the new index lookup, we don't fall into either of the bucket, as what we 
> are doing is
> extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped)
> This is SI specific logic and we should not use plain prefix lookup as the 
> behavior is not the same and can cause correctness issue. We should extra a 
> lambda function for key matching.
>  
> hfile lookup involves 2 stages:
> given keyToLookupEscaped, need to locate the data block, this requires key 
> order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile). 
> Now it will become
> {code:java}
> compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile) 
> {code}
> - once data blocks are located, we do sequential scan to find the exact key, 
> previously it is 
> RawKeyFromHfile.equals(keyToLookupEscaped) or 
> RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to