jorisvandenbossche commented on issue #36709:
URL: https://github.com/apache/arrow/issues/36709#issuecomment-1641632731

   The groupby feature is part of the Acero query execution engine 
(https://arrow.apache.org/docs/dev/cpp/streaming_execution.html), and in 
general Acero doesn't guarantee stable ordering of batches that are executed. 
   It does have support for explicit ordering (indicating that the input is 
ordered or after sorting the input), and then some nodes (like filter) will 
honor and preserve that ordering of the batches, and when executing the plan 
and gathering the results in a Table, this order will be preserved (see 
https://arrow.apache.org/docs/dev/cpp/api/acero.html#_CPPv4NK5arrow5acero8ExecNode8orderingEv
 for details). But I _assume_ a (hash) aggregation doesn't use this ordering 
for its execution (and will remove any ordering of the input to that node).
   
   Now, I can reproduce the difference that this was indeed stable in 12.0, 
while not always stable in 13.0. I am not sure if something changed in the 13.0 
release cycle that might have caused this, but I think in general the new 
behaviour is what can be expected cc @westonpace 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to