korowa commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929800732
########## datafusion/functions-aggregate/src/first_last.rs: ########## @@ -569,6 +573,13 @@ impl LastValueAccumulator { }) .collect::<Vec<_>>(); + // Order by indices for cases where the values are the same, we expect the last index + let indices: UInt64Array = (0..num_rows).map(|x| x as u64).collect(); + sort_columns.push(SortColumn { Review Comment: > I think even in this case, we still expect to return the correct number? Don't think we can -- it depends on the order of partial aggregation stream completions, which it not really predictable, so function potentially may produce non-deterministic output in this case. > Alternative solution is that the users can add another column by themselves to get the true last value if they need for their scenario I'd say that this is how it works now, which seems to be good enough. Also, maybe the initial version of this PR, was sufficient? It didn't affect performance, and it provided guarantees for `target_partitions=1` that the function will return the value which is placed last in source files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org