Dandandan commented on issue #7373:
URL: 
https://github.com/apache/arrow-datafusion/issues/7373#issuecomment-1689989520

   > @Dandandan Thanks for the reply. If you read my second comment I think the 
main thing here is adding documentation about how using ORDER BY post joins may 
be necessary to get deterministic ordering for operations such as groupbys. I 
think this is somewhat known/implied for technologies similar to this, but its 
a gotchya that people may forget about and is probably worth mentioning. Maybe 
just a section in the SQL capabilities docs about the differences between DF 
and a typical SQL DB would be useful.
   
   Thanks for clarifying.
   Yes, some join types are not order preserving. Also DataFusion will 
parallelize the query plan, which also might change the order of the output.
   
   Though I fail to see how the example you linked applies to those queries and 
`group by` specifically: `group by` shouldn't depend on the order of the input 
(some aggregation _functions_ might) and should not result in the _number of 
rows_ to be different, only the _order of rows_.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to