korowa commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929800732
##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -569,6 +573,13 @@ impl LastValueAccumulator {
})
.collect::<Vec<_>>();
+ // Order by indices for cases where the values are the same, we expect
the last index
+ let indices: UInt64Array = (0..num_rows).map(|x| x as u64).collect();
+ sort_columns.push(SortColumn {
Review Comment:
> I think even in this case, we still expect to return the correct number?
Don't think we can -- it depends on the order of partial aggregation stream
completions, which it not really predictable, so function potentially may
produce non-deterministic output in this case.
> Alternative solution is that the users can add another column by
themselves to get the true last value if they need for their scenario
I'd say that this is how it works now, which seems to be good enough.
Also, maybe the initial version of this PR, was sufficient? It didn't affect
performance, and it provided guarantees for `target_partitions=1` that the
function will return the value which is placed last in source files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]