mustafasrepo commented on code in PR #6482:
URL: https://github.com/apache/arrow-datafusion/pull/6482#discussion_r1214335912


##########
datafusion/core/tests/sqllogictests/test_files/groupby.slt:
##########
@@ -2384,3 +2384,197 @@ SELECT s.country, ARRAY_AGG(s.amount ORDER BY s.country 
DESC, s.amount DESC) AS
 FRA [200.0, 50.0] 250
 GRC [80.0, 30.0] 110
 TUR [100.0, 75.0] 175
+
+# test_reverse_aggregate_expr
+# Some of the Aggregators can be reversed, by this way we can still run 
aggregators
+# that have contradictory requirements at first glance.
+query TT
+EXPLAIN SELECT country, ARRAY_AGG(amount ORDER BY amount DESC) AS amounts,
+  FIRST_VALUE(amount ORDER BY amount ASC) AS fv1,
+  LAST_VALUE(amount ORDER BY amount DESC) AS fv2
+  FROM sales_global
+  GROUP BY country
+----
+logical_plan
+Projection: sales_global.country, ARRAY_AGG(sales_global.amount) ORDER BY 
[sales_global.amount DESC NULLS FIRST] AS amounts, 
FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST] 
AS fv1, LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount DESC 
NULLS FIRST] AS fv2
+--Aggregate: groupBy=[[sales_global.country]], 
aggr=[[ARRAY_AGG(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS 
FIRST], FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC 
NULLS LAST], LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount DESC 
NULLS FIRST]]]
+----TableScan: sales_global projection=[country, amount]
+physical_plan
+ProjectionExec: expr=[country@0 as country, ARRAY_AGG(sales_global.amount) 
ORDER BY [sales_global.amount DESC NULLS FIRST]@1 as amounts, 
FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS 
LAST]@2 as fv1, LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount 
DESC NULLS FIRST]@3 as fv2]
+--AggregateExec: mode=Single, gby=[country@0 as country], 
aggr=[ARRAY_AGG(sales_global.amount), FIRST_VALUE(sales_global.amount), 
LAST_VALUE(sales_global.amount)]

Review Comment:
   @alamb I have changed the `name` implementation. It is now calculated inside 
the aggregator, it is not given from outside during initialization. The display 
of the AggregateExec changed from 
   `--AggregateExec: mode=Single, gby=[country@0 as country], 
aggr=[ARRAY_AGG(sales_global.amount), FIRST_VALUE(sales_global.amount), 
LAST_VALUE(sales_global.amount)]` to 
   `--AggregateExec: mode=Single, gby=[country@0 as country], 
aggr=[ARRAY_AGG(sales_global.amount), LAST_VALUE(amount@1), 
LAST_VALUE(amount@1)]`
   
   Since argument expressions are `Arc<dyn PhysicalExpr>`, output display of 
the argument is changed from `sales_global.amount` to `amount@1` for 
`FIRST_VALUE` AND `LAST_VALUE`. 
   
   This change, fortunately lead me to recognize a bug during order-sensitive 
aggregation when partition number is greater than 1.
   In multi partition cases, it is not guaranteed that `Final` or 
`FinalPartitioned` stages in the `AggregateExec` to receive data in-order. 
   Hence I constrained order-sensitive aggregation to single partition cases. 
   I will add support for order-sensitive aggregation at multiple partitions, 
in following PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to