rok commented on a change in pull request #10476:
URL: https://github.com/apache/arrow/pull/10476#discussion_r648212025
##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -166,32 +168,48 @@ struct BooleanAnyImpl : public ScalarAggregator {
Status MergeFrom(KernelContext*, KernelState&& src) override {
const auto& other = checked_cast<const BooleanAnyImpl&>(src);
this->any |= other.any;
+ this->has_nulls |= other.has_nulls;
return Status::OK();
}
- Status Finalize(KernelContext*, Datum* out) override {
- out->value = std::make_shared<BooleanScalar>(this->any);
+ Status Finalize(KernelContext* ctx, Datum* out) override {
+ if (!options.skip_nulls && !this->any && this->has_nulls) {
Review comment:
Indeed this is to to make the PR "kleen" to match [R
behavior](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/any).
Meanwhile [Pandas' any is
non-kleen](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.any.html).
```
>>> import pandas as pd
>>> pd.Series([None, None]).any(skipna=True)
False
>>> pd.Series([None, None]).any(skipna=False)
>>>
```
We have three options IMO:
- Revert to non-kleene c++ behaviour and add a small fix to R
- Add kleen any/all kernels and route in R depending on flags
- Keep c++ as is and add a fix to Python (that could introduce a lot of new
unvanted logic)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]