zanmato1984 commented on issue #43768:
URL: https://github.com/apache/arrow/issues/43768#issuecomment-2303644889
> Edit: I've try to write tests here and found it's actually bug-free:
>
> 1. `ExecBatch` will regarded as size == 1 when all input is scalar ( has
constant )
> 2. So, the size is always 1, this is also handled by
`PromoteExecSpanScalars`
>
> So, actually it's always 1 here
You are right in the context when solely compute kernels are involved. In
this case, you can assume that when the argument of `any` is scalar, then the
batch length must be `1`.
However this might not be the case in a more complex context, e.g. acero.
Here is a concrete test that reproduces the expected bug (explained at last):
```C++
TEST(ScalarAggregate, BuggyAny) {
std::shared_ptr<Schema> in_schema = schema({field("not_used", int32())});
std::vector<ExecBatch> batches{
ExecBatchFromJSON({int32()}, "[[42], [42], [42], [42]]")};
std::vector<Aggregate> aggregates = {
Aggregate("any",
std::make_shared<compute::ScalarAggregateOptions>(/*skip_nulls=*/false,
/*min_count=*/2),
FieldRef("literal_true"))};
Declaration plan = Declaration::Sequence(
{{"exec_batch_source", ExecBatchSourceNodeOptions(in_schema,
std::move(batches))},
{"project", ProjectNodeOptions({literal(true)}, {"literal_true"})},
{"aggregate", AggregateNodeOptions(aggregates)}});
ASSERT_OK_AND_ASSIGN(BatchesWithCommonSchema out_batches,
DeclarationToExecBatches(plan));
std::cout << out_batches.batches[0].values[0].ToString() << std::endl;
}
```
Output:
```
Scalar(null)
```
Explain: One source node with 1 batch of 4 rows (contents don't matter),
followed by a projection node which outputs literal `true` only (also 4 rows).
The tricky part is what this projection node emits: a batch of logically 4 rows
but of a single scalar column. When this batch is eventually ingested into the
subsequent aggregation node, which calls `any` on this scalar column with
`min_count` being `2`, boom.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]