Re: [PR] Add support for conflicting order sensitive aggregates in `ARRAY_AGG` aggregate function [arrow-datafusion]

via GitHub Tue, 19 Dec 2023 04:16:20 -0800


alamb commented on PR #8558:
URL: 
https://github.com/apache/arrow-datafusion/pull/8558#issuecomment-1862649886


   @mustafasrepo @ozankabak  and I had a brief meeting this moring. Here are 
some notes I took
   
   ## Goal:
   The longer term goal is to support many more order aware normal (as opposed 
to window) aggregates such as various N'th value, rank functions, etc.  These 
would be both for built in functions and user defined aggregates. In other 
words, this is a much larger feature than just for the current three functions. 
   
   ## Possible Design Discussions
   We discussed several possible designs that had different tradeoffs (largely 
where the complexity was):
   
   1. *Aggregators* handle the ordering only: more complicated aggregators, 
simpler hash aggregate stream, can do per-aggregator optimizations (like nth 
value)
   
   2. The hash stream handles the ordering of the aggregator arguments: 
potentially reuse sorts and is more efficient, but will always sort even if the 
aggregator doesn't require it. 
   
   3. Do a rewrite of the query as multiple branches that share a CTE: would 
keep complexity out of the hash aggregator stream, and potentially offers more 
optimization opportunities, but will take longer to implement.
   
   
   ## Next steps:
   1. Run benchmarks on this PR -- if that shows no performance difference 
@alamb will complete a detailed review, and as long as there are no objections 
from the rest of the community we can merge it as a temporary measure while we 
work on a more detailed esign.  
   2. @mustafasrepo  and @alamb  will work on a larger proposal  for option 3
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add support for conflicting order sensitive aggregates in `ARRAY_AGG` aggregate function [arrow-datafusion]

Reply via email to