Dandandan commented on issue #7373: URL: https://github.com/apache/arrow-datafusion/issues/7373#issuecomment-1689989520
> @Dandandan Thanks for the reply. If you read my second comment I think the main thing here is adding documentation about how using ORDER BY post joins may be necessary to get deterministic ordering for operations such as groupbys. I think this is somewhat known/implied for technologies similar to this, but its a gotchya that people may forget about and is probably worth mentioning. Maybe just a section in the SQL capabilities docs about the differences between DF and a typical SQL DB would be useful. Thanks for clarifying. Yes, some join types are not order preserving. Also DataFusion will parallelize the query plan, which also might change the order of the output. Though I fail to see how the example you linked applies to those queries and `group by` specifically: `group by` shouldn't depend on the order of the input (some aggregation _functions_ might) and should not result in the _number of rows_ to be different, only the _order of rows_. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
