korowa commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1930999164
########## datafusion/functions-aggregate/src/first_last.rs: ########## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec<Sort #[cfg(test)] mod tests { use arrow::array::Int64Array; + use arrow_schema::Schema; + use compute::SortOptions; + use datafusion_physical_expr::{expressions::col, PhysicalSortExpr}; use super::*; + #[test] + fn test_last_value_with_order_bys() -> Result<()> { + // TODO: Move this kind of test to slt, we don't have a nice way to define the batch size for each `update_batch` Review Comment: What I've meant was ```sql statement ok CREATE TABLE memtest (x INT, y INT); statement ok INSERT INTO memtest VALUES (1, 1), (2, 1); statement ok INSERT INTO memtest VALUES (2, 2), (3, 1); statement ok INSERT INTO memtest VALUES (3, 2), (4, 1); statement ok set datafusion.execution.target_partitions = 1; query II select x, last_value(y) from memtest group by x; ---- 1 1 2 2 3 2 4 1 ``` each insert will add new RecordBatch to memory table, so there will be 3 update_batch calls for the accumulator (it will result in more calls for LastValueAccumulator::update_batch though due to splitting inputs by group value) Likely something similar can be done for multiple partitions to make round-robin repartitioning produce multiple streams to test merge_batch function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org