[GitHub] [arrow] aucahuasi commented on a change in pull request #11257: ARROW-14035: [C++][Python] Implement count distinct kernel

GitBox Wed, 29 Sep 2021 08:24:11 -0700


aucahuasi commented on a change in pull request #11257:
URL: https://github.com/apache/arrow/pull/11257#discussion_r718633160




##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -121,6 +122,84 @@ Result<std::unique_ptr<KernelState>> 
CountInit(KernelContext*,
       static_cast<const CountOptions&>(*args.options));
 }
 
+// ----------------------------------------------------------------------
+// Distinct Count implementation
+
+template <typename Type>
+struct CountDistinctImpl : public ScalarAggregator {
+  using MemoTable = typename arrow::internal::HashTraits<Type>::MemoTableType;
+
+  explicit CountDistinctImpl(MemoryPool* memory_pool, CountOptions options)
+      : options(std::move(options)), memo_table_(new MemoTable(memory_pool, 
0)) {}
+
+  Status Consume(KernelContext*, const ExecBatch& batch) override {
+    if (batch[0].is_array()) {
+      const ArrayData& arr = *batch[0].array();
+      auto visit_null = [&]() {
+        if (this->nulls > 0) return Status::OK();
+        ++this->nulls;
+        return Status::OK();
+      };

Review comment:
       Ok now I understand better, but still I was thinking about these lines:
   
   1. The amount of code to make this "prettier" (using visitor, enable_if, 
hard-coded generators, etc) is going to result in more LoC and abstractions. 
Are there any reasons why we want to make this here? Perhaps I'm missing 
something.
   2. Also, if we rely in implicit cast: Doesn't that make it worse? Because we 
will need to apply a casting every time the user wants to use a kernel. 
Explicit registration should result in faster kernel resolutions right?
   
   Let me know what you think here please!
   cc @edponce @pitrou @lidavidm 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] aucahuasi commented on a change in pull request #11257: ARROW-14035: [C++][Python] Implement count distinct kernel

Reply via email to