Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

via GitHub Mon, 27 Jan 2025 10:22:38 -0800


korowa commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1930999164



##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: 
&LexOrdering) -> Vec<Sort
 #[cfg(test)]
 mod tests {
     use arrow::array::Int64Array;
+    use arrow_schema::Schema;
+    use compute::SortOptions;
+    use datafusion_physical_expr::{expressions::col, PhysicalSortExpr};
 
     use super::*;
 
+    #[test]
+    fn test_last_value_with_order_bys() -> Result<()> {
+        // TODO: Move this kind of test to slt, we don't have a nice way to 
define the batch size for each `update_batch`

Review Comment:
   What I've meant was
   ```sql
   statement ok
   CREATE TABLE memtest (x INT, y INT);
   
   statement ok
   INSERT INTO memtest VALUES (1, 1), (2, 1);
   
   statement ok
   INSERT INTO memtest VALUES (2, 2), (3, 1);
   
   statement ok
   INSERT INTO memtest VALUES (3, 2), (4, 1);
   
   statement ok
   set datafusion.execution.target_partitions = 1;
   
   query II
   select x, last_value(y) from memtest group by x;
   ----
   1 1
   2 2
   3 2
   4 1
   ```
   
   each insert will add new RecordBatch to memory table, so there will be 3 
update_batch calls for the accumulator (it will result in more calls for 
LastValueAccumulator::update_batch though due to splitting inputs by group 
value)
   
   Likely something similar can be done for multiple partitions to make 
round-robin repartitioning produce multiple streams to test merge_batch 
function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

Reply via email to