Rachelint commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1818406542
##########
datafusion/physical-plan/src/aggregates/group_values/group_column.rs:
##########
@@ -287,6 +469,63 @@ where
};
}
+ fn vectorized_equal_to(
Review Comment:
🤔 Yes, I agree with the row by row checking is indeed not efficient enough,
and switching the similar implementation in `hash_join` may be really worth
trying.
Maybe better to try it in the follow on pr? Following points are still not
clear for me, and I want to experiment about them:
- If we need a reusable buffer to hold the taken values?
- It seems that the `cmp` for some arrays like `StringArray` and
`StringViewArray` is expansive?
Is it better to just check row by row for skipping some unnecessary
checkings (if `row` not equal in `col a`, actually we don't need to check it
again in `col b`)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]