Dandandan commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1818475481
##########
datafusion/physical-plan/src/aggregates/group_values/group_column.rs:
##########
@@ -287,6 +469,63 @@ where
};
}
+ fn vectorized_equal_to(
Review Comment:
At least `take` for `StringArray` is expensive as it needs to copy the
string data, recheck the offsets, etc. For string / byte views this will
probably be much better.
I think some approach for using `take` + `eq` for primitives while using a
custom one for string arrays will likely be best overal.
Even better would be to write a optimized implementation (an arrow kernel
that performs equality of two arrays by indices efficiently while applying the
same tricks done by take / eq kernels). The join implementation can benefit
from this as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]