neilconway opened a new pull request, #20532:
URL: https://github.com/apache/datafusion/pull/20532

   ## Which issue does this PR close?
   
   - Closes #20530 
   
   ## Rationale for this change
   
   The previous implementation of `array_position` used 
`compare_element_to_list` for every input row. When the needle is a scalar 
(quite common), we can do much better by searching over the entire flat 
haystack values array with a single call to `arrow_ord::cmp::not_distinct`. We 
can then iterate over the resulting set bits to determine per-row results.
   
   This is ~5-10x faster than the previous implementation for typical inputs.
   
   ## What changes are included in this PR?
   
   * Implement new fast path for `array_position` with scalar needle
   * Improve docs for `array_position`
   * Don't use `internal_err` to report a user-visible error
   
   ## Are these changes tested?
   
   Yes, and benchmarked. Additional tests added in a separate PR (#20531)
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to