[GitHub] [arrow] aucahuasi commented on a change in pull request #11257: ARROW-14035: [C++][Python] Implement count distinct kernel

GitBox Wed, 29 Sep 2021 08:28:50 -0700


aucahuasi commented on a change in pull request #11257:
URL: https://github.com/apache/arrow/pull/11257#discussion_r718635863




##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -754,6 +839,30 @@ void RegisterScalarAggregateBasic(FunctionRegistry* 
registry) {
                aggregate::CountInit, func.get());
   DCHECK_OK(registry->AddFunction(std::move(func)));
 
+  func = std::make_shared<ScalarAggregateFunction>(
+      "count_distinct", Arity::Unary(), &count_distinct_doc, 
&default_count_options);
+
+  // Takes any input, outputs int64 scalar
+  aggregate::AddCountDistinctKernel<Int8Type>(int8(), func.get());
+  aggregate::AddCountDistinctKernel<Int16Type>(int16(), func.get());
+  aggregate::AddCountDistinctKernel<Int32Type>(int32(), func.get());
+  aggregate::AddCountDistinctKernel<Date32Type>(date32(), func.get());
+  aggregate::AddCountDistinctKernel<Int64Type>(int64(), func.get());

Review comment:
       Ok now I understand better, but still I was thinking about these points:
   
   1. The amount of code to make this "prettier" (using visitor, enable_if, 
hard-coded generators, etc) is going to result in more LoC and abstractions. 
Are there any reasons why do we want to make this here? Perhaps I'm missing 
something.
   2. Also, if we rely in implicit cast: Doesn't that make it worse? Because we 
will need to apply a casting every time the user wants to use a kernel. 
Explicit registration should result in faster kernel resolutions right?
   
   Let me know what do you think here please!
   cc @edponce @pitrou @lidavidm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] aucahuasi commented on a change in pull request #11257: ARROW-14035: [C++][Python] Implement count distinct kernel

Reply via email to