geoffreyclaude commented on code in PR #19669:
URL: https://github.com/apache/datafusion/pull/19669#discussion_r2667706687


##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -3054,6 +3054,44 @@ drop table corr_single_row;
 statement ok
 drop table corr_all_nulls;
 
+# correlation with streaming aggregation (EmitTo::First)
+# Verify that CORR's GroupsAccumulator properly drains state vectors when 
EmitTo::First is called.
+# Set target_partitions to 1 to ensure the optimizer uses streaming 
aggregation (ordering_mode=Sorted) based on the input order.
+statement ok
+set datafusion.execution.target_partitions = 1;
+
+# Bucket 1: CORR = 1, -1, 1, -1 (y varies)
+# Bucket 2: CORR = NULL (y constant, zero variance)
+query IIR
+SELECT bucket, grp, CORR(x, y) FROM (
+    SELECT * FROM (VALUES
+        (1, 1, 1.0, 1.0), (1, 1, 2.0, 2.0),
+        (1, 2, 1.0, 2.0), (1, 2, 2.0, 1.0),
+        (1, 3, 1.0, 1.0), (1, 3, 2.0, 2.0),
+        (1, 4, 1.0, 2.0), (1, 4, 2.0, 1.0),
+        (2, 1, 1.0, 5.0), (2, 1, 2.0, 5.0),
+        (2, 2, 1.0, 5.0), (2, 2, 2.0, 5.0),
+        (2, 3, 1.0, 5.0), (2, 3, 2.0, 5.0),
+        (2, 4, 1.0, 5.0), (2, 4, 2.0, 5.0)
+    ) AS t(bucket, grp, x, y)
+    ORDER BY bucket
+    LIMIT 1000000
+) AS ordered_data
+GROUP BY bucket, grp
+ORDER BY bucket, grp;
+----
+1 1 1
+1 2 -1
+1 3 1
+1 4 -1
+2 1 NULL
+2 2 NULL
+2 3 NULL
+2 4 NULL
+

Review Comment:
   I had it at first, and removed it as I found it too verbose. Same with a 
dedicated unit test in `correlation.rs`, which seemed out of place and only 
serving as a "demo" of the bug.
   
   Adding just the `EXPLAIN` for CORR seems too specific to me here. However, I 
think it would make a lot of sense to actually have a dedicated `.slt` that 
runs EXPLAIN and the actual query for *all* aggregates.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to