martin-g commented on code in PR #19669:
URL: https://github.com/apache/datafusion/pull/19669#discussion_r2668036512


##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -3054,6 +3054,44 @@ drop table corr_single_row;
 statement ok
 drop table corr_all_nulls;
 
+# correlation with streaming aggregation (EmitTo::First)
+# Verify that CORR's GroupsAccumulator properly drains state vectors when 
EmitTo::First is called.
+# Set target_partitions to 1 to ensure the optimizer uses streaming 
aggregation (ordering_mode=Sorted) based on the input order.
+statement ok
+set datafusion.execution.target_partitions = 1;
+
+# Bucket 1: CORR = 1, -1, 1, -1 (y varies)
+# Bucket 2: CORR = NULL (y constant, zero variance)
+query IIR
+SELECT bucket, grp, CORR(x, y) FROM (
+    SELECT * FROM (VALUES
+        (1, 1, 1.0, 1.0), (1, 1, 2.0, 2.0),
+        (1, 2, 1.0, 2.0), (1, 2, 2.0, 1.0),
+        (1, 3, 1.0, 1.0), (1, 3, 2.0, 2.0),
+        (1, 4, 1.0, 2.0), (1, 4, 2.0, 1.0),
+        (2, 1, 1.0, 5.0), (2, 1, 2.0, 5.0),
+        (2, 2, 1.0, 5.0), (2, 2, 2.0, 5.0),
+        (2, 3, 1.0, 5.0), (2, 3, 2.0, 5.0),
+        (2, 4, 1.0, 5.0), (2, 4, 2.0, 5.0)
+    ) AS t(bucket, grp, x, y)
+    ORDER BY bucket
+    LIMIT 1000000
+) AS ordered_data
+GROUP BY bucket, grp
+ORDER BY bucket, grp;
+----
+1 1 1
+1 2 -1
+1 3 1
+1 4 -1
+2 1 NULL
+2 2 NULL
+2 3 NULL
+2 4 NULL
+

Review Comment:
   Either way is fine as long as there is a way to assert that it behaves the 
way it is supposed to be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to