Re: [I] Materialize Dictionaries in Group Keys [arrow-datafusion]

via GitHub Sat, 14 Oct 2023 05:55:32 -0700


qrilka commented on issue #7647:
URL: 
https://github.com/apache/arrow-datafusion/issues/7647#issuecomment-1762886861


   @tustvold I've spent some time getting familiar with the code for 
aggregations and I have a couple of questions. Maybe you could help me here?
   I see that for aggregates in an initial plan we have in the base case 2 
nested `AggregateExec`s: Partial and Final(Partitioned). And every 
`AggregateExec` uses `RowConverter` to convert arrays into rows for its input 
and back from rows into arrays for its output.
   I assume that these conversions are basically inevitable because 
`RecordBatch`es are sent between execution stages and aggregation internally 
operates on rows, not arrays, is my understanding correct here?
   It looks to me that it makes sense to convert from Dictionary to Utf8 on the 
first AggregateExec but it doesn't look like there's no mode like 
AggregateInitial. What could be the proper way out here? Maybe there could be a 
conversion step before aggregation added if any of the fields has a dictionary 
type? What would you advise?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Materialize Dictionaries in Group Keys [arrow-datafusion]

Reply via email to