It should be the same as SQL. Otherwise it takes away a lot of potential future optimization opportunities.
On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com > wrote: > > I’ve always considered DataFrames to be logically equivalent to SQL tables > or queries. > > > In SQL, the result order of any query is implementation-dependent without > an explicit ORDER BY clause. Technically, you could run `SELECT * FROM > table;` 10 times in a row and get 10 different orderings. > > > I thought the same applied to DataFrames, but the docstring for the > recently added method DataFrame.offset ( > https://github.com/apache/spark/pull/40873/files#diff-4ff57282598a3b9721b8d6f8c2fea23a62e4bc3c0f1aa5444527549d1daa38baR1293-R1301 > ) implies otherwise. > > > This example will work fine in practice, of course. But if DataFrames are > technically unordered without an explicit ordering clause, then in theory > a future implementation change may result in “Bob" being the “first” row > in the DataFrame, rather than “Tom”. That would make the example > incorrect. > > > Is that not the case? > > > Nick >
smime.p7s
Description: S/MIME Cryptographic Signature