westonpace opened a new issue, #34786:
URL: https://github.com/apache/arrow/issues/34786
### Describe the bug, including details regarding any error messages,
version, and platform.
The code calculating the output schema is here:
```
FieldVector output_fields;
output_fields.reserve(key_field_ids.size() + measure_size);
// extract aggregate fields to output schema
for (const auto& agg_src_fieldset : agg_src_fieldsets) {
for (int field : agg_src_fieldset) {
output_fields.emplace_back(input_schema->field(field));
}
}
// extract key fields to output schema
for (int key_field_id : key_field_ids) {
output_fields.emplace_back(input_schema->field(key_field_id));
}
std::shared_ptr<Schema> aggregate_schema =
schema(std::move(output_fields));
```
This appears to have two issues:
* It is inserting the key fields after the measure fields
* It is inserting measure fields based on the function inputs and not the
function outputs
I suspect we are getting away with it in many cases because we are not
applying projection / emit after the aggregate. At the very least, we should
add some test cases that do this so we can verify the output schema is correct.
If there is indeed an issue then we should also fix that.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]