jayzhan211 commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929924628


##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: 
&LexOrdering) -> Vec<Sort
 #[cfg(test)]
 mod tests {
     use arrow::array::Int64Array;
+    use arrow_schema::Schema;
+    use compute::SortOptions;
+    use datafusion_physical_expr::{expressions::col, PhysicalSortExpr};
 
     use super::*;
 
+    #[test]
+    fn test_last_value_with_order_bys() -> Result<()> {
+        // TODO: Move this kind of test to slt, we don't have a nice way to 
define the batch size for each `update_batch`

Review Comment:
   What I want is multiple batches that goes through `update_batch` and 
`merge_batch`.
   
   ```
   statement count 0
   set datafusion.execution.batch_size = 2;
   
   statement count 0
   create table t(a int, b int) as values (1, 1), (2, 1), (null, 1), (3, 1), 
(1, 1), (2, 1), (null, 1), (3, 1);
   
   query I
   select last_value(a order by b) from t;
   ----
   1
   
   query TT
   explain select last_value(a order by b) from t;
   ----
   logical_plan
   01)Aggregate: groupBy=[[]], aggr=[[last_value(t.a) ORDER BY [t.b ASC NULLS 
LAST]]]
   02)--TableScan: t projection=[a, b]
   physical_plan
   01)AggregateExec: mode=Final, gby=[], aggr=[last_value(t.a) ORDER BY [t.b 
ASC NULLS LAST]]
   02)--CoalescePartitionsExec
   03)----AggregateExec: mode=Partial, gby=[], aggr=[last_value(t.a) ORDER BY 
[t.b ASC NULLS LAST]]
   04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
   05)--------MemoryExec: partitions=1, partition_sizes=[1]
   ```
   
   Given that the `MemoryExec` is single partition, so the data goes to single 
batch. Even we have 4 partitions, `update_batch` is only called once. No 
trivial way to test multiple `update_batch` calls with different batch.
   
   `Insert into Table ...` is the same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to