edponce commented on a change in pull request #11257:
URL: https://github.com/apache/arrow/pull/11257#discussion_r718152329
##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -121,6 +122,84 @@ Result<std::unique_ptr<KernelState>>
CountInit(KernelContext*,
static_cast<const CountOptions&>(*args.options));
}
+// ----------------------------------------------------------------------
+// Distinct Count implementation
+
+template <typename Type>
+struct CountDistinctImpl : public ScalarAggregator {
+ using MemoTable = typename arrow::internal::HashTraits<Type>::MemoTableType;
+
+ explicit CountDistinctImpl(MemoryPool* memory_pool, CountOptions options)
+ : options(std::move(options)), memo_table_(new MemoTable(memory_pool,
0)) {}
+
+ Status Consume(KernelContext*, const ExecBatch& batch) override {
+ if (batch[0].is_array()) {
+ const ArrayData& arr = *batch[0].array();
+ auto visit_null = [&]() {
+ if (this->nulls > 0) return Status::OK();
+ ++this->nulls;
+ return Status::OK();
+ };
Review comment:
IIUC, `this->nulls` is of interest in terms of it either being zero or
non-zero, and if it is non-zero the actual value does not matters. The lambda
has the check to try to short-circuit based on it being non-zero but I will
argue that the conditional check can be more expensive than a simple integer
add. So I would simplify it to an unconditional counter:
```c++
auto visit_null = [&]() {
++this->nulls;
return Status::OK();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]