[ https://issues.apache.org/jira/browse/SPARK-15825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-15825: ------------------------------------ Assignee: (was: Apache Spark) > sort-merge-join gives invalid results when joining on a tupled key > ------------------------------------------------------------------ > > Key: SPARK-15825 > URL: https://issues.apache.org/jira/browse/SPARK-15825 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark 2.0.0-SNAPSHOT > Reporter: Andres Perez > > {noformat} > import org.apache.spark.sql.functions > val left = List("0", "1", "2").toDS() > .map{ k => ((k, 0), "l") } > val right = List("0", "1", "2").toDS() > .map{ k => ((k, 0), "r") } > val result = left.toDF("k", "v").as[((String, Int), String)].alias("left") > .joinWith(right.toDF("k", "v").as[((String, Int), > String)].alias("right"), functions.col("left.k") === > functions.col("right.k"), "inner") > .as[(((String, Int), String), ((String, Int), String))] > {noformat} > When broadcast joins are enabled, we get the expected output: > {noformat} > (((0,0),l),((0,0),r)) > (((1,0),l),((1,0),r)) > (((2,0),l),((2,0),r)) > {noformat} > However, when broadcast joins are disabled (i.e. setting > spark.sql.autoBroadcastJoinThreshold to -1), the result is incorrect: > {noformat} > (((2,0),l),((2,-1),)) > (((0,0),l),((0,-313907893),)) > (((1,0),l),((null,-313907893),)) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org