[ https://issues.apache.org/jira/browse/SPARK-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558875#comment-15558875 ]
Xiao Li edited comment on SPARK-9265 at 10/11/16 3:14 AM: ---------------------------------------------------------- This has been resolved since Limit and Sort are executed as a `TakeOrderedAndProject` operator. Close it now. Thanks! was (Author: smilegator): This has been resolved since our Optimizer push down `Limit` below `Sort`. Close it now. Thanks! > Dataframe.limit joined with another dataframe can be non-deterministic > ---------------------------------------------------------------------- > > Key: SPARK-9265 > URL: https://issues.apache.org/jira/browse/SPARK-9265 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1 > Reporter: Tathagata Das > Priority: Critical > > {code} > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > val recentFailures = table("failed_suites").cache() > val topRecentFailures = > recentFailures.groupBy('suiteName).agg(count("*").as('failCount)).orderBy('failCount.desc).limit(10) > topRecentFailures.show(100) > val mot = topRecentFailures.as("a").join(recentFailures.as("b"), > $"a.suiteName" === $"b.suiteName") > > (1 to 10).foreach { i => > println(s"$i: " + mot.count()) > } > {code} > This shows. > {code} > +--------------------+---------+ > | suiteName|failCount| > +--------------------+---------+ > |org.apache.spark....| 85| > |org.apache.spark....| 26| > |org.apache.spark....| 26| > |org.apache.spark....| 17| > |org.apache.spark....| 17| > |org.apache.spark....| 15| > |org.apache.spark....| 13| > |org.apache.spark....| 13| > |org.apache.spark....| 11| > |org.apache.spark....| 9| > +--------------------+---------+ > 1: 174 > 2: 166 > 3: 174 > 4: 106 > 5: 158 > 6: 110 > 7: 174 > 8: 158 > 9: 166 > 10: 106 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org