Re: [PR] feat: Add distinct accumulator for `Perfect Hash Join` [datafusion]

via GitHub Wed, 17 Sep 2025 04:50:19 -0700


adriangb commented on code in PR #17606:
URL: https://github.com/apache/datafusion/pull/17606#discussion_r2354280460



##########
datafusion/functions-aggregate/src/count.rs:
##########
@@ -746,12 +746,25 @@ fn null_count_for_multiple_cols(values: &[ArrayRef]) -> 
usize {
 /// more efficient such as [`PrimitiveDistinctCountAccumulator`] and
 /// [`BytesDistinctCountAccumulator`]
 #[derive(Debug)]
-struct DistinctCountAccumulator {
+pub struct DistinctCountAccumulator {
     values: HashSet<ScalarValue, RandomState>,

Review Comment:
   Or could we use the information we already have? E.g. every time we add a 
value to our _existing_ hash tables we check if it was already there or not. 
That would be "free", it's just tracking a mutable boolean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add distinct accumulator for `Perfect Hash Join` [datafusion]

Reply via email to