hudi-bot opened a new issue, #17070:
URL: https://github.com/apache/hudi/issues/17070

   https://issues.apache.org/jira/browse/HUDI-9505
   
   New SI semantics please refer the above Jira and PR description.
   
    
   
   When doing the lookup, we have 
    * keyToLookup: which is the raw value passed down by caller. The value is 
not escaped and can be null
    * hfile
   
   the flow is 
   
   escape keyToLookup -> sort -> hfile lookup
   
    
   
   the hfile lookup involves some key matching rules, which as of today include:
    * Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped)
    * prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped)
   
    
   
   For the new index lookup, we don't fall into either of the bucket, as what 
we are doing is
   
   extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped)
   
   This is SI specific logic and we should not use plain prefix lookup as the 
behavior is not the same and can cause correctness issue. We should extra a 
lambda function for key matching.
   
    
   
   hfile lookup involves 2 stages:
   
   given keyToLookupEscaped, need to locate the data block, this requires key 
order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile). 
Now it will become
   {code:java}
   compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile) 
{code}
   - once data blocks are located, we do sequential scan to find the exact key, 
previously it is 
   
   RawKeyFromHfile.equals(keyToLookupEscaped) or 
RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become
   
    
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-9551
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to