mustafasrepo commented on issue #9972:
URL: 
https://github.com/apache/arrow-datafusion/issues/9972#issuecomment-2055582944

   Sorry for the late reply. Since I was in vacation, couldn't look here.
   > Btw, why is reverse expr in Avg, Sum, MinMax, and Count just returning 
clone, what is the difference between returns None?
   
   As an example usecase: Consider the query
   ```
   SELECT SUM(b), FIRST_VALUE(b ORDER BY c DESC)
   FROM table
   GROUP BY a
   ```
   where `table` is already ordered by `c ASC`. In this case, by taking reverse 
of `SUM`(which is itself) and `FIRST_VALUE` we can convert query above to it 
equivalent form below
   ```
   SELECT SUM(b), LAST_VALUE(b ORDER BY c ASC)
   FROM table
   GROUP BY a
   ```
   to align ordering requirement with existing ordering. Returning `None` from 
`fn reverser_expr()` indicates that when input data is iterated in reverse 
order, the result generated wouldn't be same compared to existing version. 
However, for `SUM`, `AVG` etc. when input data is iterated in reverse order, 
the result is same. As an another counter example, consider query
   ```
   SELECT ARRAY_AGG(b ORDER BY c DESC), FIRST_VALUE(b ORDER BY c DESC)
   FROM table
   GROUP BY a
   ```
   where `table` is ordered by `c ASC` as before. There is no way to produce 
result of the `ARRAY_AGG(b ORDER BY c DESC) with the ordering `c ASC` at the 
input. Hence, for `ARRAY_AGG`, this implementation returns `None` to 
communicate this feature.
   In short, for order insensitive aggregators we should implement `fn 
reverse_expr` by returning the clone of the existing aggregator, to communicate 
same result would be generated in reverse order (in any arbitrary permutation 
actually).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to