vbarua commented on code in PR #13511:
URL: https://github.com/apache/datafusion/pull/13511#discussion_r1979981659
##########
datafusion/expr/src/expr.rs:
##########
@@ -295,6 +295,8 @@ pub enum Expr {
/// See also [`ExprFunctionExt`] to set these fields.
///
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
+ ///
+ /// cf. `WITHIN GROUP` is converted to `ORDER BY` internally in
`datafusion/sql/src/expr/function.rs`
Review Comment:
minor/opinionated: I'm not sure if it's worth mentioning this at all here.
`WITHIN GROUP` is effectively an `ORDER BY` specified differently. This only
matters at the SQL layer, and you handle and explain it there already.
##########
datafusion/sql/src/expr/function.rs:
##########
@@ -349,15 +365,49 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
} else {
// User defined aggregate functions (UDAF) have precedence in case
it has the same name as a scalar built-in function
if let Some(fm) = self.context_provider.get_aggregate_meta(&name) {
- let order_by = self.order_by_to_sort_expr(
- order_by,
- schema,
- planner_context,
- true,
- None,
- )?;
- let order_by = (!order_by.is_empty()).then_some(order_by);
- let args = self.function_args_to_expr(args, schema,
planner_context)?;
+ if fm.is_ordered_set_aggregate() && within_group.is_empty() {
+ return plan_err!("WITHIN GROUP clause is required when
calling ordered set aggregate function({})", fm.name());
+ }
+
+ if null_treatment.is_some() &&
!fm.supports_null_handling_clause() {
+ return plan_err!(
+ "[IGNORE | RESPECT] NULLS are not permitted for {}",
+ fm.name()
+ );
+ }
Review Comment:
Per my point about `[IGNORE | RESPECT] NULLS` being a property of _window
functions_, I don't think we need this check here.
##########
datafusion/sql/src/expr/function.rs:
##########
@@ -177,8 +180,14 @@ impl FunctionArgs {
}
}
- if !within_group.is_empty() {
- return not_impl_err!("WITHIN GROUP is not supported yet:
{within_group:?}");
+ if !within_group.is_empty() && order_by.is_some() {
+ return plan_err!("ORDER BY clause is only permitted in WITHIN
GROUP clause when a WITHIN GROUP is used");
+ }
+
+ if within_group.len() > 1 {
+ return not_impl_err!(
+ "Multiple column ordering in WITHIN GROUP clause is not
supported"
Review Comment:
Minor wording suggestion
```
Only a single ordering expression is permitted in a WITHIN GROUP clause
```
which explicitly points users to what they should do, instead of telling
them what they can't.
##########
datafusion/functions-aggregate/src/approx_percentile_cont.rs:
##########
@@ -51,29 +52,39 @@ create_func!(ApproxPercentileCont,
approx_percentile_cont_udaf);
/// Computes the approximate percentile continuous of a set of numbers
pub fn approx_percentile_cont(
- expression: Expr,
+ within_group: Sort,
Review Comment:
minor/opinionated: I think `order_by` would be a clearer name for this, as
the `WITHIN GROUP` is really just a wrapper around the `ORDER BY` clause.
##########
datafusion/sql/src/expr/function.rs:
##########
@@ -177,8 +180,14 @@ impl FunctionArgs {
}
}
- if !within_group.is_empty() {
- return not_impl_err!("WITHIN GROUP is not supported yet:
{within_group:?}");
+ if !within_group.is_empty() && order_by.is_some() {
+ return plan_err!("ORDER BY clause is only permitted in WITHIN
GROUP clause when a WITHIN GROUP is used");
Review Comment:
minor: I just noticed that in the block above there is a check for duplicate
order bys. I think it would be good to fold this into that check
```rust
FunctionArgumentClause::OrderBy(oby) => {
if order_by.is_some() { // can check for within group
here
return not_impl_err!("Calling {name}: Duplicated
ORDER BY clause in function arguments");
}
order_by = Some(oby);
}
```
to consolidate the handling into one place.
##########
datafusion/functions-aggregate/src/approx_percentile_cont.rs:
##########
@@ -131,6 +147,19 @@ impl ApproxPercentileCont {
args: AccumulatorArgs,
) -> Result<ApproxPercentileAccumulator> {
let percentile = validate_input_percentile_expr(&args.exprs[1])?;
+
+ let is_descending = args
+ .ordering_req
+ .first()
+ .map(|sort_expr| sort_expr.options.descending)
+ .unwrap_or(false);
+
+ let percentile = if is_descending {
+ 1.0 - percentile
+ } else {
+ percentile
+ };
Review Comment:
This seems reasonable to me, but I don't have that much experience on the
execution side of things.
##########
datafusion/expr/src/udaf.rs:
##########
@@ -845,6 +861,19 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
ScalarValue::try_from(data_type)
}
+ /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause,
return true
+ /// If the function does not, return false
+ /// Otherwise return None (the default)
+ fn supports_null_handling_clause(&self) -> Option<bool> {
+ None
+ }
Review Comment:
This was smelling odd, so I dug a bit deeper. I think you've inadvertantly
stumbled into something even weirder than you anticipated
The example you've linked is
```sql
SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t;
```
which I don't think is a valid query because `first_value` _should not_ be
an aggregate function, or at the very least the above query is not valid in
most SQL dialects. `first_value` is actually a window function in other engines
(eg. [Trino](https://trino.io/docs/current/functions/window.html#first_value),
[Postgres](https://www.postgresql.org/docs/17/functions-window.html),
[MySQL](https://dev.mysql.com/doc/refman/8.4/en/window-function-descriptions.html#function_first-value)).
If you try running something like
```sql
SELECT first_value(column1) FROM t;
```
against Postgres you get an error like
```
Query Error: window function first_value requires an OVER clause
```
[dbfiddle](https://www.db-fiddle.com/f/oMU1DDB2915nrHLKqUVj9a/0)
The `RESPECT NULLS | IGNORE NULLS` options is only a property of certain
_window_ functions, hence we shouldn't need to track it for aggregate functions.
I'm going to file a ticket for the above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]