korowa commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1930999164
##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs:
&LexOrdering) -> Vec<Sort
#[cfg(test)]
mod tests {
use arrow::array::Int64Array;
+ use arrow_schema::Schema;
+ use compute::SortOptions;
+ use datafusion_physical_expr::{expressions::col, PhysicalSortExpr};
use super::*;
+ #[test]
+ fn test_last_value_with_order_bys() -> Result<()> {
+ // TODO: Move this kind of test to slt, we don't have a nice way to
define the batch size for each `update_batch`
Review Comment:
What I've meant was
```sql
statement ok
CREATE TABLE memtest (x INT, y INT);
statement ok
INSERT INTO memtest VALUES (1, 1), (2, 1);
statement ok
INSERT INTO memtest VALUES (2, 2), (3, 1);
statement ok
INSERT INTO memtest VALUES (3, 2), (4, 1);
statement ok
set datafusion.execution.target_partitions = 1;
query II
select x, last_value(y) from memtest group by x;
----
1 1
2 2
3 2
4 1
```
each insert will add new RecordBatch to memory table, so there will be 3
update_batch calls for the accumulator (it will result in more calls for
LastValueAccumulator::update_batch though due to splitting inputs by group
value)
Likely something similar can be done for multiple partitions to make
round-robin repartitioning produce multiple streams to test merge_batch
function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]