Davis-Zhang-Onehouse opened a new pull request, #13523: URL: https://github.com/apache/hudi/pull/13523
### Change Logs As of today, to look up key "secKey" via secondary index over secondary column index key in format "secKey$recKey", we intend to match all records whose secondary key portion (anything prior to $) match the given "secKey". As of today we do such matching via prefix matching, which is not functional at all times. For example, if we have index record "secKey1$recKey" "secKey2$recKey" "secKey3$recKey" and we try to do index look up with "secKey", it should match nothing, while prefix matching will say all 3 matches and causes query correctness issues. To solve the issue, we need to have customized key matching logic in the following form: given "secKey" as the look up key and index record key "secKey$recKey". We should extract secKey out and do string.equal the look up key. The PR extend the hfile record prefix matching iterator to abstract away the logic of how the seekKey (key consumed by hfile reader seekTo) is generated and the key matching logic. So based on the predicate type, we can pick the iterator that use the proper implementation underneath. Here is how it works Before: A boolean flag "isFullKey" is used to choose between full key matching / prefix key matching After: We use Expression.Operator enum to specify what type of hfile iterator we would like to use. Before: in hfile prefix reader iterator, - seek key used to do reader.seek to is exactly the lookup key in its escaped form - match logic is string.startsWith Now the prefix supports 2 iterator - the old prefix iterator and secondary key iterator. The later - provides seek key as [escaped look up key] + "$" - match logic is extractEscapedSecKey(index record key).equals(escaped lookup key) ### Impact Secondary index look up now behaves correctly. ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
