Rachelint commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1818601807
##########
datafusion/physical-plan/src/aggregates/group_values/group_column.rs:
##########
@@ -287,6 +469,63 @@ where
};
}
+ fn vectorized_equal_to(
Review Comment:
> At least `take` for `StringArray` is expensive as it needs to copy the
string data, recheck the offsets, etc. For string / byte views this will
probably be much better. I think some approach for using `take` + `eq` for
primitives while using a custom one for string arrays will likely be best
overal.
>
> Even better would be to write a optimized implementation (an arrow kernel
that performs equality of two arrays by indices efficiently while applying the
same tricks done by take / eq kernels). The join implementation can benefit
from this as well.
Make sense for me, and support it in `arrow` seems really good.
I plan to sort out codes in this pr firstly, and then try to improve the
primitive case using `take` + `eq`.
I guess some case like `q32` in clickbench is possible to get benefit from
it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]