Davis Zhang created HUDI-9551:
---------------------------------

             Summary: Secondary index lookup in table version 9 is not prefix 
lookup
                 Key: HUDI-9551
                 URL: https://issues.apache.org/jira/browse/HUDI-9551
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Davis Zhang


https://issues.apache.org/jira/browse/HUDI-9505

New SI semantics please refer the above Jira and PR description.

 

When doing the lookup, we have 
 * keyToLookup: which is the raw value passed down by caller. The value is not 
escaped and can be null
 * hfile

the flow is 

escape keyToLookup -> sort -> hfile lookup

 

the hfile lookup involves some key matching rules, which as of today include:
 * Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped)
 * prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped)

 

For the new index lookup, we don't fall into either of the bucket, as what we 
are doing is

extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped)

This is SI specific logic and we should not use plain prefix lookup as the 
behavior is not the same and can cause correctness issue. We should extra a 
lambda function for key matching.

 

hfile lookup involves 2 stages:

given keyToLookupEscaped, need to locate the data block, this requires key 
order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile). 
Now it will become
{code:java}
compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile) 
{code}
- once data blocks are located, we do sequential scan to find the exact key, 
previously it is 

RawKeyFromHfile.equals(keyToLookupEscaped) or 
RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to