neilconway opened a new issue, #20384:
URL: https://github.com/apache/datafusion/issues/20384

   ### Is your feature request related to a problem or challenge?
   
   When `array_has_any` is passed a scalar for either of its argument, we can 
use a much faster algorithm: rather than doing O(N*M) comparisons for each row 
of the columnar arg, we can build a hash table on the scalar array and probe it 
instead.
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   #18181 discusses a user-reported query where `array_has_any` is slow. In 
that scenario, `array_has_any` is called on a table column and an uncorrelated 
subquery, which is currently passed to `array_has_any` as a par of columnar 
arguments (i.e., we don't take advantage of the fact that the subquery argument 
is effectively fixed). Optimizing that query involves two steps:
   
   1. Optimize `array_has_any` for a scalar arg, which is this ticket. This has 
value as a standalone optimization.
   2. Query optimization improvement to handle this general class of queries 
better; I'll do some more digging here and file another ticket shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to