bhat-vinay opened a new pull request, #10955:
URL: https://github.com/apache/hudi/pull/10955

   Record level index speeds up queries (when appropriate config properties are 
enabled) by pruning files based on metadata's RLI partition entries. The 
current implementation can prune files only when the query predicate has a 
EqalTo (i.e '=') OR In filters/expressions on the record-key column.
   
   However, the logic to detect if a 'In' query predicate references a 
record-key column is buggy. This can result in wrong results when the query 
predicate is a `In` expression ona  column other than record-key column. This 
seems like a serious bug that needs to be fixed.
   
   This PR fixes this problem and adds a unit test for the same.
   
   ### Change Logs
   
   However, the logic to detect if a 'In' query predicate references a 
record-key column is buggy. This can result in wrong results when the query 
predicate is a `In` expression ona  column other than record-key column. This 
seems like a serious bug that needs to be fixed. This PR fixes this problem and 
adds a unit test for the same.
   
   The changes introduced are:
   RecordLevelIndexSupport.scala: The method `filterQueryWithRecordKey(...)` is 
where the bug exists. The switch-case for `In` expression was not filtering out 
expressions that are based on non-record-key columns. Fixed it correctly.
   
   TestRecordLevelIndexWithSQL.scala: Adds a unit test that clearly shows the 
problem with wrong results and the changes in this PR that fixes the wrong 
result issue.
   
   ### Impact
   
   Bug fix
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None. No new configs are user visible options/changes added
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to