jorisvandenbossche commented on a change in pull request #10476:
URL: https://github.com/apache/arrow/pull/10476#discussion_r648385623
##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -166,32 +168,48 @@ struct BooleanAnyImpl : public ScalarAggregator {
Status MergeFrom(KernelContext*, KernelState&& src) override {
const auto& other = checked_cast<const BooleanAnyImpl&>(src);
this->any |= other.any;
+ this->has_nulls |= other.has_nulls;
return Status::OK();
}
- Status Finalize(KernelContext*, Datum* out) override {
- out->value = std::make_shared<BooleanScalar>(this->any);
+ Status Finalize(KernelContext* ctx, Datum* out) override {
+ if (!options.skip_nulls && !this->any && this->has_nulls) {
Review comment:
> Meanwhile Pandas' any is non-kleen.
The pandas `any`/`all` methods were broken for object dtype (and since numpy
doesn't support nulls in its boolean dtype, whenever you have missing values,
you have object dtype), so best not to use that as a reference (see eg
https://github.com/pandas-dev/pandas/issues/27709)
For the new nullable boolean dtype in pandas, the any/all methods also use
kleene logic like in R.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]