bhat-vinay opened a new pull request, #10955: URL: https://github.com/apache/hudi/pull/10955
Record level index speeds up queries (when appropriate config properties are enabled) by pruning files based on metadata's RLI partition entries. The current implementation can prune files only when the query predicate has a EqalTo (i.e '=') OR In filters/expressions on the record-key column. However, the logic to detect if a 'In' query predicate references a record-key column is buggy. This can result in wrong results when the query predicate is a `In` expression ona column other than record-key column. This seems like a serious bug that needs to be fixed. This PR fixes this problem and adds a unit test for the same. ### Change Logs However, the logic to detect if a 'In' query predicate references a record-key column is buggy. This can result in wrong results when the query predicate is a `In` expression ona column other than record-key column. This seems like a serious bug that needs to be fixed. This PR fixes this problem and adds a unit test for the same. The changes introduced are: RecordLevelIndexSupport.scala: The method `filterQueryWithRecordKey(...)` is where the bug exists. The switch-case for `In` expression was not filtering out expressions that are based on non-record-key columns. Fixed it correctly. TestRecordLevelIndexWithSQL.scala: Adds a unit test that clearly shows the problem with wrong results and the changes in this PR that fixes the wrong result issue. ### Impact Bug fix ### Risk level (write none, low medium or high below) None ### Documentation Update None. No new configs are user visible options/changes added ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
