westonpace opened a new issue, #34786:
URL: https://github.com/apache/arrow/issues/34786

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The code calculating the output schema is here:
   
   ```
         FieldVector output_fields;
         output_fields.reserve(key_field_ids.size() + measure_size);
         // extract aggregate fields to output schema
         for (const auto& agg_src_fieldset : agg_src_fieldsets) {
           for (int field : agg_src_fieldset) {
             output_fields.emplace_back(input_schema->field(field));
           }
         }
         // extract key fields to output schema
         for (int key_field_id : key_field_ids) {
           output_fields.emplace_back(input_schema->field(key_field_id));
         }
   
         std::shared_ptr<Schema> aggregate_schema = 
schema(std::move(output_fields));
   ```
   
   This appears to have two issues:
   
    * It is inserting the key fields after the measure fields
    * It is inserting measure fields based on the function inputs and not the 
function outputs
   
   I suspect we are getting away with it in many cases because we are not 
applying projection / emit after the aggregate.  At the very least, we should 
add some test cases that do this so we can verify the output schema is correct. 
 If there is indeed an issue then we should also fix that.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to