gabotechs commented on code in PR #14413:
URL: https://github.com/apache/datafusion/pull/14413#discussion_r1963100216


##########
datafusion/functions-aggregate/src/array_agg.rs:
##########
@@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg {
         let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
 
         if acc_args.is_distinct {
-            return 
Ok(Box::new(DistinctArrayAggAccumulator::try_new(&data_type)?));
+            // Limitation similar to Postgres. The aggregation function can 
only mix
+            // DISTINCT and ORDER BY if all the expressions in the ORDER BY 
appear
+            // also in the arguments of the function. For example:
+            //
+            // ARRAY_AGG(DISTINCT col)
+            //
+            // can only be mixed with an ORDER BY if the order expression is 
"col".
+            //
+            // ARRAY_AGG(DISTINCT col ORDER BY col)                         <- 
Valid
+            // ARRAY_AGG(DISTINCT concat(col, '') ORDER BY concat(col, '')) <- 
Valid
+            // ARRAY_AGG(DISTINCT col ORDER BY other_col)                   <- 
Invalid
+            // ARRAY_AGG(DISTINCT col ORDER BY concat(col, ''))             <- 
Invalid
+            if acc_args.ordering_req.len() > 1 {
+                return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");
+            }
+            let mut sort_option: Option<SortOptions> = None;
+            if let Some(order) = acc_args.ordering_req.first() {
+                if !order.expr.eq(&acc_args.exprs[0]) {
+                    return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");

Review Comment:
   > Since there can be only one ORDER BY expression, the error message should 
refer to it in singular
   
   This was pretty much the error message that Postgres would return, but they 
have this limitation as a general rule for all aggregation functions, which 
does not seem to be the case for DataFusion. The other option would be to fully 
customize the error message for the specific ARRAY_AGG case, which will result 
in a less standard but more informative error message. Any opinions here @alamb 
?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to