jayzhan211 opened a new issue, #12163:
URL: https://github.com/apache/datafusion/issues/12163

   ### Is your feature request related to a problem or challenge?
   
   We found that computing with Eq Kernel is much faster than RowConverter for 
`array_has`
   
   There are other functions that have potential to further speedup with Eq 
Kernel.
   - [ ] array_has_all
   - [ ] array_has_any
   - [ ] array_intersect
   - [ ] array_distinct (Maybe 🤔 ?)
   - [ ] array_except (Maybe 🤔 ?)
   
   ### Describe the solution you'd like
   
   The overall idea is that we flatten the left hand side of the list and 
iterate right hand side elements, apply Eq kernel for each element. #12062 We 
could know whether the element is in left hand side of the list by checking the 
`true_count` (and `null_count` for null handling). The eq kernel is vectorized, 
that is the key of speedup.
   
   
   ## Example
   ### array_has_all
   array_has_all([1,2,3], [1,2]) -> true
   Iterate [1,2], compare [1,2,3] with 1 and [1,2,3] with 2. We will  get 
[true,false,false] and [false,true,false]. Both boolean array contains true, 
therefore return true
   
   ### array_has_any
   array_has_any([1,2,3], [1,4]) -> true
   Iterate [1,2], compare [1,2,3] with 1 and [1,2,3] with 4. We will  get 
[true,false,false] and [false,false,false]. Since first boolean array contains 
true, therefore return true
   
   ### array_has_intersect
   array_has_intersect([1,2,3], [1,2]) -> [1,2]
   The same idea like above, we know that both element is contained in the 
list. The difference is that we expect to return Array. We could get the 
expected array with MutableArrayData
   
   I'm not pretty sure about distinct and except, but worth to figure out as 
well
   
   For more detail implementation could take `array_has` as reference
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to