gabotechs commented on code in PR #14413:
URL: https://github.com/apache/datafusion/pull/14413#discussion_r1963095533


##########
datafusion/functions-aggregate/src/array_agg.rs:
##########
@@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg {
         let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
 
         if acc_args.is_distinct {
-            return 
Ok(Box::new(DistinctArrayAggAccumulator::try_new(&data_type)?));
+            // Limitation similar to Postgres. The aggregation function can 
only mix
+            // DISTINCT and ORDER BY if all the expressions in the ORDER BY 
appear
+            // also in the arguments of the function. For example:
+            //
+            // ARRAY_AGG(DISTINCT col)
+            //
+            // can only be mixed with an ORDER BY if the order expression is 
"col".
+            //
+            // ARRAY_AGG(DISTINCT col ORDER BY col)                         <- 
Valid
+            // ARRAY_AGG(DISTINCT concat(col, '') ORDER BY concat(col, '')) <- 
Valid
+            // ARRAY_AGG(DISTINCT col ORDER BY other_col)                   <- 
Invalid
+            // ARRAY_AGG(DISTINCT col ORDER BY concat(col, ''))             <- 
Invalid
+            if acc_args.ordering_req.len() > 1 {
+                return exec_err!("In an aggregate with DISTINCT, ORDER BY 
expressions must appear in argument list");

Review Comment:
   The restriction goes further than that: all the expressions used in the 
ORDER BY must appear in as arguments of the aggregation function. In this case, 
ARRAY_AGG accepts one argument, therefore only that 1 expression is also valid 
as an ORDER BY expression.
   
   I took the error message from Postgres, but maybe it's better to customize 
it to the ARRAY_AGG function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to