mustafasrepo commented on code in PR #6671:
URL: https://github.com/apache/arrow-datafusion/pull/6671#discussion_r1230531322


##########
datafusion/core/tests/user_defined_aggregates.rs:
##########
@@ -79,15 +98,41 @@ async fn test_udaf_as_window() {
     ];
     assert_batches_eq!(expected, &execute(&ctx, sql).await);
     // aggregate over the entire window function call update_batch
-    assert!(counters.update_batch());
-    assert!(!counters.retract_batch());
+    assert!(test_state.update_batch());
+    assert!(!test_state.retract_batch());
 }
 
 /// User defined aggregate used as a window function with a window frame
 #[tokio::test]
 async fn test_udaf_as_window_with_frame() {
-    let TestContext { ctx, counters } = TestContext::new();
+    let TestContext { ctx, test_state } = TestContext::new();
+    let sql = "SELECT time_sum(time) OVER(ORDER BY time ROWS BETWEEN 1 
PRECEDING AND 1 FOLLOWING) as time_sum from t";
+    let expected = vec![
+        "+----------------------------+",
+        "| time_sum                   |",
+        "+----------------------------+",
+        "| 1970-01-01T00:00:00.000005 |",
+        "| 1970-01-01T00:00:00.000009 |",
+        "| 1970-01-01T00:00:00.000012 |",
+        "| 1970-01-01T00:00:00.000014 |",
+        "| 1970-01-01T00:00:00.000010 |",
+        "+----------------------------+",
+    ];
+    assert_batches_eq!(expected, &execute(&ctx, sql).await);
+    // user defined aggregates with window frame should be calling retract 
batch
+    assert!(test_state.update_batch());
+    assert!(test_state.retract_batch());
+}
+
+/// Ensure that User defined aggregate used as a window function with a window
+/// frame, but that does not implement retract_batch, does not error
+#[tokio::test]
+async fn test_udaf_as_window_with_frame_without_retract_batch() {
+    let test_state = Arc::new(TestState::new().with_error_on_retract_batch());
+
+    let TestContext { ctx, test_state } = 
TestContext::new_with_test_state(test_state);
     let sql = "SELECT time_sum(time) OVER(ORDER BY time ROWS BETWEEN 1 
PRECEDING AND 1 FOLLOWING) as time_sum from t";
+    // TODO: It is not clear why this is a different value than when retract 
batch is used

Review Comment:
   Accumulators when they have window frame startings different than `UNBOUNDED 
PRECEDING` such as `1 preceding` need to implement `retract_batch` method, to 
be able to run correctly (If this method is lacking there is no way to 
calculate result). Consider the query 
   ```sql
   SELECT SUM(a) OVER(ORDER BY a ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as 
sum_a from t
   ```
   
   First sum value will be the sum of rows between [0, 1), 
   Second sum value will be the sum of  rows between [0, 2), 
   Third sum value will be the sum of  rows between [1, 3), etc.
   Since accumulator keeps in its state, running sum
   for first sum we add to the state sum value between [0, 1)
   for second sum we add to the state sum value between [1, 2) ([0, 1) is 
already in the state sum, hence running sum will cover [0, 2) range)
   for third sum we add to the state sum value between [2, 3) ([0, 2) is 
already in the state sum). Also we need to retract values between [0, 1) by 
this way we can obtain sum between [1, 3) which is indeed the apropriate range. 
   
   When we use `UNBOUNDED PRECEDING ` in the query starting index will always 
be 0 for the desired range. Hence we will never call `retract_batch` method. In 
this case having `retract_batch` is not a requirement. 
   
   This approach is a a bit different than window function approach. In window 
function(when they use frame) they get all the desired range during evaluation. 
If we were to have a window_function sum we could have calculated sum from 
scratch each time inside given range. However, for accumulators this is the 
case, and above query should give error if `retract_batch` is not implemented. 
During review I have pinpointed the section where I think this error is 
prevented
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to