Re: Are DataFrame rows ordered without an explicit ordering clause?

Reynold Xin Mon, 18 Sep 2023 08:55:59 -0700

It should be the same as SQL. Otherwise it takes away a lot of potential future 
optimization opportunities.


On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com > 
wrote:

> 
> I’ve always considered DataFrames to be logically equivalent to SQL tables
> or queries.
> 
> 
> In SQL, the result order of any query is implementation-dependent without
> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
> table;` 10 times in a row and get 10 different orderings.
> 
> 
> I thought the same applied to DataFrames, but the docstring for the
> recently added method DataFrame.offset (
> https://github.com/apache/spark/pull/40873/files#diff-4ff57282598a3b9721b8d6f8c2fea23a62e4bc3c0f1aa5444527549d1daa38baR1293-R1301
> ) implies otherwise.
> 
> 
> This example will work fine in practice, of course. But if DataFrames are
> technically unordered without an explicit ordering clause, then in theory
> a future implementation change may result in “Bob" being the “first” row
> in the DataFrame, rather than “Tom”. That would make the example
> incorrect.
> 
> 
> Is that not the case?
> 
> 
> Nick
>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Are DataFrame rows ordered without an explicit ordering clause?

Reply via email to