findepi commented on PR #16625:
URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3093779036

   
   
   
   > I re-read the comments on this PR and I wonder if you tried implementing 
the solution suggested by @ozankabak in [#16625 
(comment)](https://github.com/apache/datafusion/pull/16625#issuecomment-3019757986):
   > 
   > > If getting this over the finish line in the short term is important to 
you, I think a reasonable step forward is to take a look at what solving it 
entails: There are basically two steps: (1) AggregateExec needs to consult the 
UDAF definition as it forms its required input ordering (this is the easy 
step), (2) The enforce sorting rule needs to address the case when there is 
already a sort with a prefix of a soft requirement, and just extend the sort 
keys (this is the harder step).
   > > I think the solution will ultimately be small in terms of LOC changes, 
but the second step will require some thinking.
   > 
   > This PR seems similar except that it adds the `SoftRequirement` stage as 
well. If we could avoid the need for `SoftRequirement` I think this PR would be 
pretty great
   
   https://github.com/apache/datafusion/pull/16625#discussion_r2178514868
   https://github.com/apache/datafusion/pull/16625#issuecomment-3029193406
   
   I am not convinced it's actually desired to make the existing `Beneficial` 
behave like the new `SoftRequirement` does though.
   
   Consider first_value / top_1 function, which is O(n). It benefits from the 
input being sorted, becoming O(1) in such case. It does not, however, want to 
impose input sorting, as that would be O(n log n).
   This is different from e.g. sorted array_agg, which needs to sort either way.
   
   One can argue that sorting for "first_value order by a, b" can be added only 
if there already is some sorting on a. It's not a bad argument, but note that 
within the group of least a value, it's still O(group size) -> O(group size * 
log group size) change.
   
   Thus, it seems optimal to be able to distinguish functions that
   
   - benefit from ordering, e.g. first_value (`Beneficial`)
   - those are simply better if input can be ordered, e.g. ordered array_agg in 
this PR (`SoftRequirement` being added in this PR)
   - those which cannot execute if input is not pre-ordered, e.g. ordered 
array_agg before this PR (`HardRequirement`)
   - those which do not care about input ordering (`Insensitive`)
   
   Thus it makes sense for the `AggregateOrderSensitivity` to have 4 options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to