alamb commented on PR #8558: URL: https://github.com/apache/arrow-datafusion/pull/8558#issuecomment-1862649886
@mustafasrepo @ozankabak and I had a brief meeting this moring. Here are some notes I took ## Goal: The longer term goal is to support many more order aware normal (as opposed to window) aggregates such as various N'th value, rank functions, etc. These would be both for built in functions and user defined aggregates. In other words, this is a much larger feature than just for the current three functions. ## Possible Design Discussions We discussed several possible designs that had different tradeoffs (largely where the complexity was): 1. *Aggregators* handle the ordering only: more complicated aggregators, simpler hash aggregate stream, can do per-aggregator optimizations (like nth value) 2. The hash stream handles the ordering of the aggregator arguments: potentially reuse sorts and is more efficient, but will always sort even if the aggregator doesn't require it. 3. Do a rewrite of the query as multiple branches that share a CTE: would keep complexity out of the hash aggregator stream, and potentially offers more optimization opportunities, but will take longer to implement. ## Next steps: 1. Run benchmarks on this PR -- if that shows no performance difference @alamb will complete a detailed review, and as long as there are no objections from the rest of the community we can merge it as a temporary measure while we work on a more detailed esign. 2. @mustafasrepo and @alamb will work on a larger proposal for option 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
