[GitHub] [arrow-datafusion] korowa commented on a diff in pull request #7385: fix: inconsistent scalar types in `DistinctArrayAggAccumulator` state

via GitHub Thu, 24 Aug 2023 07:58:56 -0700


korowa commented on code in PR #7385:
URL: https://github.com/apache/arrow-datafusion/pull/7385#discussion_r1304469204



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -1271,14 +1271,31 @@ NULL 4 29 1.260869565217 123 -117 23
 NULL 5 -194 -13.857142857143 118 -101 14
 NULL NULL 781 7.81 125 -117 100
 
-# TODO this querys output is non determinisitic (the order of the elements
-# differs run to run
+# TODO: array_agg_distinct output is non-determinisitic -- rewrite with 
array_sort(list_sort)
+#       unnest is also not available, so manually unnesting via CROSS JOIN
+# additional count(1) forces array_agg_distinct instead of array_agg over 
aggregated by c2 data
 #
 # csv_query_array_agg_distinct
-# query T
-# SELECT array_agg(distinct c2) FROM aggregate_test_100
-# ----
-# [4, 2, 3, 5, 1]
+query III
+WITH indices AS (

Review Comment:
   Actual aggregation input is `aggregate_test_100.c2` -- in the 
[subquery](https://github.com/apache/arrow-datafusion/pull/7385/files#diff-a60d69cf923e18b0532e7ba3470c4b728e6b00230b3ea242f3fbe95e899c5ffeR1289).
 These indices are required only to check query output -- I was't able to find 
better way to compare non-deterministic array -- more details in the 
[thread](https://github.com/apache/arrow-datafusion/pull/7385#discussion_r1303147504)
 above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] korowa commented on a diff in pull request #7385: fix: inconsistent scalar types in `DistinctArrayAggAccumulator` state

Reply via email to