Davis Zhang created HUDI-9551:
---------------------------------
Summary: Secondary index lookup in table version 9 is not prefix
lookup
Key: HUDI-9551
URL: https://issues.apache.org/jira/browse/HUDI-9551
Project: Apache Hudi
Issue Type: Bug
Reporter: Davis Zhang
https://issues.apache.org/jira/browse/HUDI-9505
New SI semantics please refer the above Jira and PR description.
When doing the lookup, we have
* keyToLookup: which is the raw value passed down by caller. The value is not
escaped and can be null
* hfile
the flow is
escape keyToLookup -> sort -> hfile lookup
the hfile lookup involves some key matching rules, which as of today include:
* Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped)
* prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped)
For the new index lookup, we don't fall into either of the bucket, as what we
are doing is
extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped)
This is SI specific logic and we should not use plain prefix lookup as the
behavior is not the same and can cause correctness issue. We should extra a
lambda function for key matching.
hfile lookup involves 2 stages:
given keyToLookupEscaped, need to locate the data block, this requires key
order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile).
Now it will become
{code:java}
compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile)
{code}
- once data blocks are located, we do sequential scan to find the exact key,
previously it is
RawKeyFromHfile.equals(keyToLookupEscaped) or
RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become
--
This message was sent by Atlassian Jira
(v8.20.10#820010)