lidavidm commented on a change in pull request #11257:
URL: https://github.com/apache/arrow/pull/11257#discussion_r718649372



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -754,6 +839,30 @@ void RegisterScalarAggregateBasic(FunctionRegistry* 
registry) {
                aggregate::CountInit, func.get());
   DCHECK_OK(registry->AddFunction(std::move(func)));
 
+  func = std::make_shared<ScalarAggregateFunction>(
+      "count_distinct", Arity::Unary(), &count_distinct_doc, 
&default_count_options);
+
+  // Takes any input, outputs int64 scalar
+  aggregate::AddCountDistinctKernel<Int8Type>(int8(), func.get());
+  aggregate::AddCountDistinctKernel<Int16Type>(int16(), func.get());
+  aggregate::AddCountDistinctKernel<Int32Type>(int32(), func.get());
+  aggregate::AddCountDistinctKernel<Date32Type>(date32(), func.get());
+  aggregate::AddCountDistinctKernel<Int64Type>(int64(), func.get());

Review comment:
       Sure, that's reasonable. I think the key points here are just to not 
generate distinct kernels where it's not needed (e.g. Timestamp should use the 
same underlying impl as Int64, and frankly, Int64/UInt64 can/should be 
consolidated, right?) and to make sure things like zoned timestamps are 
properly supported. That said, consistency with the rest of the kernels is 
nice, even if the codebase is not perfect about being consistent throughout 
(and Eduardo has been identifying/filing cases like this) - at the very least, 
it may be worth putting up a follow-up JIRA. (Also because I think there's 
other things we may want to do, e.g. we could perhaps support dictionaries.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to