geoffreyclaude commented on code in PR #14413:
URL: https://github.com/apache/datafusion/pull/14413#discussion_r1962026333


##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -176,6 +176,17 @@ FROM
 ----
 [0VVIHzxWtNOFLtnhjHEKjXaJOSLJfm, 0keZ5G8BffGwgF2RwQD59TFzMStxCB, 
0og6hSkhbX8AC1ktFS4kounvTzy8Vo, 1aOcrEGd0cOqZe2I5XBOm0nDcwtBZO, 
2T3wSlHdEmASmO0xcXHnndkKEt6bz8]
 
+# array agg can use order by with distinct
+query ?
+SELECT array_agg(DISTINCT c13 ORDER BY c13)

Review Comment:
   Could you also add tests that trigger errors? Ideally, the two cases you've 
added in `array_agg.rs`.



##########
datafusion/functions-aggregate/src/array_agg.rs:
##########
@@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg {
         let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
 
         if acc_args.is_distinct {
-            return 
Ok(Box::new(DistinctArrayAggAccumulator::try_new(&data_type)?));
+            // Limitation similar to Postgres. The aggregation function can 
only mix
+            // DISTINCT and ORDER BY if all the expressions in the ORDER BY 
appear
+            // also in the arguments of the function. For example:
+            //
+            // ARRAY_AGG(DISTINCT col)
+            //
+            // can only be mixed with an ORDER BY if the order expression is 
"col".
+            //
+            // ARRAY_AGG(DISTINCT col ORDER BY col)                         <- 
Valid
+            // ARRAY_AGG(DISTINCT concat(col, '') ORDER BY concat(col, '')) <- 
Valid
+            // ARRAY_AGG(DISTINCT col ORDER BY other_col)                   <- 
Invalid
+            // ARRAY_AGG(DISTINCT col ORDER BY concat(col, ''))             <- 
Invalid
+            if acc_args.ordering_req.len() > 1 {
+                return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");
+            }
+            let mut sort_option: Option<SortOptions> = None;
+            if let Some(order) = acc_args.ordering_req.first() {
+                if !order.expr.eq(&acc_args.exprs[0]) {
+                    return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");

Review Comment:
   Since there can be only one `ORDER BY` expression, the error message should 
refer to it in singular. I'd suggest:
   ```rust
   return exec_err!("In an aggregate with DISTINCT, the ORDER BY expression 
must match the aggregate argument");
   ```



##########
datafusion/functions-aggregate/src/array_agg.rs:
##########
@@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg {
         let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
 
         if acc_args.is_distinct {
-            return 
Ok(Box::new(DistinctArrayAggAccumulator::try_new(&data_type)?));
+            // Limitation similar to Postgres. The aggregation function can 
only mix
+            // DISTINCT and ORDER BY if all the expressions in the ORDER BY 
appear
+            // also in the arguments of the function. For example:
+            //
+            // ARRAY_AGG(DISTINCT col)
+            //
+            // can only be mixed with an ORDER BY if the order expression is 
"col".
+            //
+            // ARRAY_AGG(DISTINCT col ORDER BY col)                         <- 
Valid
+            // ARRAY_AGG(DISTINCT concat(col, '') ORDER BY concat(col, '')) <- 
Valid
+            // ARRAY_AGG(DISTINCT col ORDER BY other_col)                   <- 
Invalid
+            // ARRAY_AGG(DISTINCT col ORDER BY concat(col, ''))             <- 
Invalid
+            if acc_args.ordering_req.len() > 1 {
+                return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");

Review Comment:
   This looks like a copy-paste typo of the error message? Shouldn't it rather 
be something more like:
   ```rust
   return exec_err!("In an aggregate with DISTINCT, only one ORDER BY 
expression is allowed");
   ```
   ?



##########
datafusion/functions-aggregate/src/array_agg.rs:
##########
@@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg {
         let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
 
         if acc_args.is_distinct {
-            return 
Ok(Box::new(DistinctArrayAggAccumulator::try_new(&data_type)?));
+            // Limitation similar to Postgres. The aggregation function can 
only mix

Review Comment:
   This doc comment is great! Would it be possible to make it more user-facing? 
Maybe as a rustdoc over the `ArrayAgg` structure? As well as in the `ArrayAgg` 
documentation if it exists?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to