[ 
https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358606#comment-15358606
 ] 

Max Moroz edited comment on SPARK-16207 at 7/1/16 8:01 AM:
-----------------------------------------------------------

I think it would also be important to say which methods preserve order and 
which don't. I did my best to describe it (by referring to methods that "don't 
involve grouping or sorting"), but that's rather vague.

Maybe I'm exaggerating the importance of this point, but for someone like me, 
who's new to Spark, it's very hard to figure out whether a DF kept or lost the 
ordering created by a previously executed orderBy. For instance, I never found 
anything in the docs to say that groupBy may not preserve ordering, so I'm glad 
you mentioned it.

As for inclusion of this note in numerous function, would it not be easier (if 
only for maintenance reasons) to write this paragraph just once, and simply 
include a reference to it (with a hyperlink or otherwise) from those methods 
that depend on order?


was (Author: mmoroz):
Would it not be easier (if only for maintenance reasons) to write this 
paragraph just once, and simply include a reference to it (with a hyperlink or 
otherwise) from those methods that depend on order?

Also, it would be important to say which methods preserve order and which 
don't. I did my best to describe it (by referring to methods that "don't 
involve grouping or sorting"), but that's rather vague.

Maybe I'm exaggerating the importance of this point, but for someone like me, 
who's new to Spark, it's very hard to figure out whether a DF kept or lost the 
ordering created by a previously executed orderBy. For instance, I never found 
anything in the docs to say that groupBy may not preserve ordering, so I'm glad 
you mentioned it.

> order guarantees for DataFrames
> -------------------------------
>
>                 Key: SPARK-16207
>                 URL: https://issues.apache.org/jira/browse/SPARK-16207
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's no clear explanation in the documentation about what guarantees are 
> available for the preservation of order in DataFrames. Different blogs, SO 
> answers, and posts on course websites suggest different things. It would be 
> good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file 
> it was read from? (Or as the json file, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to