Jefffrey commented on PR #17805:
URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3418476797

   > What do you mean "strictly apply" ? As in return an error for queries with 
`WITHIN GROUP` for an aggregate that 
`AggregateUDFImpl::is_ordered_set_aggregate() --> false`?
   
   Yes, since that syntax is essentially ignored in that case, so I think it's 
better to explicit error if users try do this.
   
   > Rather than just `ascending` how about taking in `expression: SortExpr`, 
which would also allow specifying `NULLS FIRST` and/or `NULLS LAST` ?
   
   Ordered-set aggregates actually ignore nulls completely in both postgres & 
duckdb, so we wouldn't need to consider null orders.
   
   - Apparently hypothetical-set aggregate functions in postgres do respect 
nulls, but that is a separate feature
   
   This made me realise though that we have `supports_null_handling_clause`:
   
   
https://github.com/apache/datafusion/blob/93f136c06dcb6d4cb362110ae5a4b2b3b8571bb7/datafusion/expr/src/udaf.rs#L743-L747
   
   And this would be tied to `is_ordered_set_aggregate` as it doesn't make 
sense to have both be true; I'll look more into this separately.
   
   > I have one more question, which is "what should we do with just ORDER BY 
clause in the argument? For example
   >
   > ```sql
   > SELECT approx_percentile_cont(expr ORDER BY time, 0.5) FROM ...
   > ```
   >
   > Would we treat that `ORDER BY` the same as if it were specified in the 
`WITHIN GROUP` clause?
   
   Ok I didn't realize we supported this syntax 😅 
   
   Time for another rabbit hole.
   
   **Reference: DuckDB**
   
   Base case:
   
   ```sql
   D select quantile_cont(col0, 0.75) from values (1, 3), (2, 2), (3, 1) t;
   ┌───────────────────────────┐
   │ quantile_cont(col0, 0.75) │
   │          double           │
   ├───────────────────────────┤
   │            2.5            │
   └───────────────────────────┘
   ```
   
   DuckDB allows an order by after the arguments:
   
   ```sql
   D select quantile_cont(col0, 0.75 order by col0) from values (1, 3), (2, 2), 
(3, 1) t;
   ┌─────────────────────────────────────────┐
   │ quantile_cont(col0, 0.75 ORDER BY col0) │
   │                 double                  │
   ├─────────────────────────────────────────┤
   │                   2.5                   │
   └─────────────────────────────────────────┘
   D select quantile_cont(col0, 0.75 order by col0 asc) from values (1, 3), (2, 
2), (3, 1) t;
   ┌─────────────────────────────────────────────┐
   │ quantile_cont(col0, 0.75 ORDER BY col0 ASC) │
   │                   double                    │
   ├─────────────────────────────────────────────┤
   │                     2.5                     │
   └─────────────────────────────────────────────┘
   D select quantile_cont(col0, 0.75 order by col0 desc) from values (1, 3), 
(2, 2), (3, 1) t;
   ┌──────────────────────────────────────────────┐
   │ quantile_cont(col0, 0.75 ORDER BY col0 DESC) │
   │                    double                    │
   ├──────────────────────────────────────────────┤
   │                     1.5                      │
   └──────────────────────────────────────────────┘
   ```
   
   Interestingly, if you try order by a separate column (`col1`) it behaves as 
if you're ordering by `col0` instead:
   
   ```sql
   D select quantile_cont(col0, 0.75 order by col1 asc) from values (1, 3), (2, 
2), (3, 1) t;
   ┌─────────────────────────────────────────────┐
   │ quantile_cont(col0, 0.75 ORDER BY col1 ASC) │
   │                   double                    │
   ├─────────────────────────────────────────────┤
   │                     2.5                     │
   └─────────────────────────────────────────────┘
   D select quantile_cont(col0, 0.75 order by col1 desc) from values (1, 3), 
(2, 2), (3, 1) t;
   ┌──────────────────────────────────────────────┐
   │ quantile_cont(col0, 0.75 ORDER BY col1 DESC) │
   │                    double                    │
   ├──────────────────────────────────────────────┤
   │                     1.5                      │
   └──────────────────────────────────────────────┘
   ```
   
   See how `col0` and `col1` have values in reverse of each other; we'd expect 
`quantile_cont(col0, 0.75 order by col0 asc)` to be the same as 
`quantile_cont(col0, 0.75 order by col1 desc)` but this is not the case. So we 
can surmise that it just looks at the `asc`/`desc` part and ignores the actual 
column in the `ORDER BY`, considering only the column in the first argument.
   
   This syntax is also supported:
   
   ```sql
   D select quantile_cont(0.75 order by col0) from values (1, 3), (2, 2), (3, 
1) t;
   ┌───────────────────────────────────┐
   │ quantile_cont(0.75 ORDER BY col0) │
   │              double               │
   ├───────────────────────────────────┤
   │                2.5                │
   └───────────────────────────────────┘
   D select quantile_cont(0.75 order by col0 asc) from values (1, 3), (2, 2), 
(3, 1) t;
   ┌───────────────────────────────────────┐
   │ quantile_cont(0.75 ORDER BY col0 ASC) │
   │                double                 │
   ├───────────────────────────────────────┤
   │                  2.5                  │
   └───────────────────────────────────────┘
   D select quantile_cont(0.75 order by col0 desc) from values (1, 3), (2, 2), 
(3, 1) t;
   ┌────────────────────────────────────────┐
   │ quantile_cont(0.75 ORDER BY col0 DESC) │
   │                 double                 │
   ├────────────────────────────────────────┤
   │                  1.5                   │
   └────────────────────────────────────────┘
   ```
   
   It is similar to `WITHIN GROUP` in that it implicitly takes the column you 
are ordering by as the column you want to compute on.
   
   Lastly, it is a syntax error if you try `ORDER BY` before the last argument:
   
   ```sql
   D select quantile_cont(col0 order by col0, 0.75) from values (1, 3), (2, 2), 
(3, 1) t;
   Binder Error:
   ORDER BY non-integer literal has no effect.
   * SET order_by_non_integer_literal=true to allow this behavior.
   
   Perhaps you misplaced ORDER BY; ORDER BY must appear after all regular 
arguments of the aggregate.
   
   LINE 1: select quantile_cont(col0 order by col0, 0.75) from values (1, 3), 
(2, 2), (3, 1) t;
   ```
   
   **DataFusion**
   
   We support having the `ORDER BY` inside but after the arguments but it 
doesn't seem to do anything:
   
   ```sql
   > select quantile_cont(col0, 0.75 order by col0) from values (1, 3), (2, 2), 
(3, 1) t(col0, col1);
   +-------------------------------------+
   | quantile_cont(t.col0,Float64(0.75)) |
   +-------------------------------------+
   | 2.5                                 |
   +-------------------------------------+
   1 row(s) fetched.
   Elapsed 0.009 seconds.
   
   > select quantile_cont(col0, 0.75 order by col0 desc) from values (1, 3), 
(2, 2), (3, 1) t(col0, col1);
   +-------------------------------------+
   | quantile_cont(t.col0,Float64(0.75)) |
   +-------------------------------------+
   | 2.5                                 |
   +-------------------------------------+
   1 row(s) fetched.
   Elapsed 0.011 seconds.
   ```
   
   - Expect output to change since we change sort order
   
   This doesn't work either:
   
   ```sql
   > select quantile_cont(0.75 order by col0) from values (1, 3), (2, 2), (3, 
1) t(col0, col1);
   Error during planning: Failed to coerce arguments to satisfy a call to 
'percentile_cont' function: coercion from Float64 to the signature 
OneOf([Exact([Int8, Float64]), Exact([Int16, Float64]), Exact([Int32, 
Float64]), Exact([Int64, Float64]), Exact([UInt8, Float64]), Exact([UInt16, 
Float64]), Exact([UInt32, Float64]), Exact([UInt64, Float64]), Exact([Float32, 
Float64]), Exact([Float64, Float64])]) failed No function matches the given 
name and argument types 'percentile_cont(Float64)'. You might need to add 
explicit type casts.
           Candidate functions:
           percentile_cont(Int8, Float64)
           percentile_cont(Int16, Float64)
           percentile_cont(Int32, Float64)
           percentile_cont(Int64, Float64)
           percentile_cont(UInt8, Float64)
           percentile_cont(UInt16, Float64)
           percentile_cont(UInt32, Float64)
           percentile_cont(UInt64, Float64)
           percentile_cont(Float32, Float64)
           percentile_cont(Float64, Float64)
   ```
   
   - Looks like it pretty much ignores the `order by col0` as it sees the input 
as `'percentile_cont(Float64)'`
   
   Same if you put the order by after the first argument:
   
   ```sql
   > select quantile_cont(col0 order by col0, 0.75) from values (1, 3), (2, 2), 
(3, 1) t(col0, col1);
   Error during planning: Failed to coerce arguments to satisfy a call to 
'percentile_cont' function: coercion from Int64 to the signature 
OneOf([Exact([Int8, Float64]), Exact([Int16, Float64]), Exact([Int32, 
Float64]), Exact([Int64, Float64]), Exact([UInt8, Float64]), Exact([UInt16, 
Float64]), Exact([UInt32, Float64]), Exact([UInt64, Float64]), Exact([Float32, 
Float64]), Exact([Float64, Float64])]) failed No function matches the given 
name and argument types 'percentile_cont(Int64)'. You might need to add 
explicit type casts.
           Candidate functions:
           percentile_cont(Int8, Float64)
           percentile_cont(Int16, Float64)
           percentile_cont(Int32, Float64)
           percentile_cont(Int64, Float64)
           percentile_cont(UInt8, Float64)
           percentile_cont(UInt16, Float64)
           percentile_cont(UInt32, Float64)
           percentile_cont(UInt64, Float64)
           percentile_cont(Float32, Float64)
           percentile_cont(Float64, Float64)
   ```
   
   - This time it sees `'percentile_cont(Int64)'` which means it ignores the 
`0.75` fraction?
   
   And if we try using it with `WITHIN GROUP`:
   
   ```sql
   > select quantile_cont(col0, 0.75 order by col0 asc) within group (order by 
col0) from values (1, 3), (2, 2), (3, 1) t(col0, col1);
   Error during planning: ORDER BY and WITHIN GROUP clauses cannot be used 
together in the same aggregate function
   > select quantile_cont(0.75 order by col0 asc) within group (order by col0) 
from values (1, 3), (2, 2), (3, 1) t(col0, col1);
   Error during planning: ORDER BY and WITHIN GROUP clauses cannot be used 
together in the same aggregate function
   > select quantile_cont(col0 order by col0, 0.75) within group (order by 
col0) from values (1, 3), (2, 2), (3, 1) t(col0, col1);
   Error during planning: ORDER BY and WITHIN GROUP clauses cannot be used 
together in the same aggregate function
   ```
   
   We get a nice SQL error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to