[ 
https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358681#comment-15358681
 ] 

Max Moroz commented on SPARK-16207:
-----------------------------------

Well, I hope most transformations that keep the number of rows unchanged 
(select, withColumn, withColumnRenamed, drop, collect, fillna, foreach, rdd) 
preserve order; but coalesce maybe doesn't, not sure? Also, first, take should 
preserve the order. 

I'm guessing filter and distinct don't preserve order?

Was surprised about groupby but that's my fault for believing blogs / SO posts.


> order guarantees for DataFrames
> -------------------------------
>
>                 Key: SPARK-16207
>                 URL: https://issues.apache.org/jira/browse/SPARK-16207
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's no clear explanation in the documentation about what guarantees are 
> available for the preservation of order in DataFrames. Different blogs, SO 
> answers, and posts on course websites suggest different things. It would be 
> good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file 
> it was read from? (Or as the json file, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to