qrilka commented on issue #7647: URL: https://github.com/apache/arrow-datafusion/issues/7647#issuecomment-1762886861
@tustvold I've spent some time getting familiar with the code for aggregations and I have a couple of questions. Maybe you could help me here? I see that for aggregates in an initial plan we have in the base case 2 nested `AggregateExec`s: Partial and Final(Partitioned). And every `AggregateExec` uses `RowConverter` to convert arrays into rows for its input and back from rows into arrays for its output. I assume that these conversions are basically inevitable because `RecordBatch`es are sent between execution stages and aggregation internally operates on rows, not arrays, is my understanding correct here? It looks to me that it makes sense to convert from Dictionary to Utf8 on the first AggregateExec but it doesn't look like there's no mode like AggregateInitial. What could be the proper way out here? Maybe there could be a conversion step before aggregation added if any of the fields has a dictionary type? What would you advise? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
