mustafasrepo commented on issue #9972: URL: https://github.com/apache/arrow-datafusion/issues/9972#issuecomment-2055582944
Sorry for the late reply. Since I was in vacation, couldn't look here. > Btw, why is reverse expr in Avg, Sum, MinMax, and Count just returning clone, what is the difference between returns None? As an example usecase: Consider the query ``` SELECT SUM(b), FIRST_VALUE(b ORDER BY c DESC) FROM table GROUP BY a ``` where `table` is already ordered by `c ASC`. In this case, by taking reverse of `SUM`(which is itself) and `FIRST_VALUE` we can convert query above to it equivalent form below ``` SELECT SUM(b), LAST_VALUE(b ORDER BY c ASC) FROM table GROUP BY a ``` to align ordering requirement with existing ordering. Returning `None` from `fn reverser_expr()` indicates that when input data is iterated in reverse order, the result generated wouldn't be same compared to existing version. However, for `SUM`, `AVG` etc. when input data is iterated in reverse order, the result is same. As an another counter example, consider query ``` SELECT ARRAY_AGG(b ORDER BY c DESC), FIRST_VALUE(b ORDER BY c DESC) FROM table GROUP BY a ``` where `table` is ordered by `c ASC` as before. There is no way to produce result of the `ARRAY_AGG(b ORDER BY c DESC) with the ordering `c ASC` at the input. Hence, for `ARRAY_AGG`, this implementation returns `None` to communicate this feature. In short, for order insensitive aggregators we should implement `fn reverse_expr` by returning the clone of the existing aggregator, to communicate same result would be generated in reverse order (in any arbitrary permutation actually). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
