jayzhan211 commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929384667


##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -569,6 +573,13 @@ impl LastValueAccumulator {
             })
             .collect::<Vec<_>>();
 
+        // Order by indices for cases where the values are the same, we expect 
the last index
+        let indices: UInt64Array = (0..num_rows).map(|x| x as u64).collect();
+        sort_columns.push(SortColumn {

Review Comment:
   ~Maybe we need benchmark on this.
   With additional indices columns, we are able to optimize it with select n_th 
logic
   Without additional indices columns, we need to iterate them all the find the 
correct one, it includes the cost of comparison~
   
   
   > since this will work only for single threaded queries, while, I suppose, 
the primarily use-case is multithreaded execution, which messes up the original 
order of records in files due to parallel reads followed by repartitioning. 
   
   I think even in this case, we still expect to return the correct number?
   
   I agree that in 2-phase step, the **true** last value is calculated in 
`merge_batch`, but if we run it in AggregateMode::Single, we would need to get 
the **true** last value in `update_batch`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to