alamb commented on a change in pull request #984:
URL: https://github.com/apache/arrow-rs/pull/984#discussion_r761431099
##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -200,6 +201,42 @@ macro_rules! compare_op_scalar_primitive {
}};
}
+macro_rules! compare_dict_op_scalar {
+ ($left:expr, $right:expr, $op:expr) => {{
+ let null_bit_buffer = $left
+ .data()
+ .null_buffer()
+ .map(|b| b.bit_slice($left.offset(), $left.len()));
+
+ let values = $left
+ .values()
+ .as_any()
+ .downcast_ref::<StringArray>()
+ .unwrap();
+
+ // Safety:
+ // `i < $left.len()`
+ let comparison = (0..$left.len()).map(|i| unsafe {
+ let key = $left.keys().value_unchecked(i).to_usize().unwrap();
+ $op(values.value_unchecked(key), $right)
+ });
Review comment:
🤔
Yes this is definitely tricky. Maybe taking a step back, and think about the
usecase: comparing DictionaryArrays to literals.
For example, if you look at the comparison kernels (for `eq`) ,
https://docs.rs/arrow/6.3.0/arrow/compute/kernels/comparison/index.html we find;
```
eq_scalar
eq_bool_scalar
eq_utf8_scalar
```
With each being typed based on the type of scalar (because the arrays are
typed)
The issue with a `DictionaryArray` is that it could have numbers, bool,
strings, etc. so we can't have a single entrypoint as we do with other types of
arrays
So i am thinking we would need something like
```
eq_dict_scalar // numeric
eq_dict_bool_scalar // boolean
eq_dict_utf8_scalar // strings
```
where each of those kernels would be able to downcast the array
appropriately.
However, having three functions for each dict kernel seems somewhat crazy.
That is where my dyn idea was coming from. If we are going to add three new
kernels for each operator (`eq`, `lt`, etc) we could perhaps add
```
eq_dyn_scalar // numeric
eq_dyn_bool_scalar // boolean
eq_dyn_utf8_scalar // strings
```
etc
Which handle DictionaryArray as well as dispatching to the other
`eq_scalar`, `eq_bool_scalar`, `eq_utf8_scalar` as appropriate.
Does that make sense? I can try and sketch out the interface this weekend
sometime
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]