Hi Arrow[Rust] developers,

I came across an instance where I wanted to compare 2 arrays that aren't
numeric (bool, string, list?), and couldn't conveniently leverage the
comparison array_ops for this. This is due to the trait bounds that require
that PrimitiveArray<T> satisfy T: ArrowNumericType.

Users might need/want to compare non-numeric arrays, at least with {equal |
not equal} functions. It's not hard to write a custom function to do so,
but we would leave a lot of detail down to the user.

I would like to propose that we expand the *compute::eq* and *compute::neq*
functions to cater for non-numeric arrays.
For reference, these can be found in
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs

So far, I see 2 options:

1. A fast path for booleans that uses existing SIMD-enabled *eq* and *neq*,
if we can cast True=1, False=0 fast enough (the cast kernel already exists)
2. A slow path for non-numeric arrays where we perform element-wise
comparisons
3. A hashing approach where we hash values (to i64?) and leverage the
SIMD-enabled *eq* and *neq*.

Do you have any opinions on the above?

Thanks
Neville

Reply via email to