neilconway opened a new pull request, #20677:
URL: https://github.com/apache/datafusion/pull/20677

   ## Which issue does this PR close?
   
   N/A
   
   ## Rationale for this change
   
   In #20374, `array_has` with a scalar needle was optimized to reconstruct 
matches more efficiently. Unfortunately, that code was incorrect for sliced 
arrays: `values()` returns the entire value buffer (including elements outside 
the visible slice), so we need to skip the corresponding indexes in the result 
bitmap.
   
   We could fix this by just skipping indexes, but it seems more robust and 
efficient to arrange to not compare the needle against elements outside the 
visible range in the first place.
   
   `array_position` has a similar behavior: it didn't have the buggy behavior, 
but it still did extra work for sliced arrays by comparing against elements 
outside the visible range.
   
   Benchmarking the revised code, there is no performance regression for 
unsliced arrays.
   
   ## What changes are included in this PR?
   
   * Fix `array_has` bug for sliced arrays with scalar needle
   * Improve `array_has` and `array_position` to not compare against elements 
outside the visible range of a sliced array
   * Add unit test for `array_has` bug
   * Add unit test to increase confidence in `array_position` behavior for 
sliced arrays
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to