[ https://issues.apache.org/jira/browse/SPARK-20660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023333#comment-16023333 ]
Michel Lemay commented on SPARK-20660: -------------------------------------- In my opinion, two schema should be considered the same if columns are the same regardless of the order. However, throwing an error would be significantly better than doing unexpected things. > Not able to merge Dataframes with different column orders > --------------------------------------------------------- > > Key: SPARK-20660 > URL: https://issues.apache.org/jira/browse/SPARK-20660 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Michel Lemay > Priority: Minor > > Union on two dataframes with different column orders is not supported and > lead to hard to find issues. > Here is an example showing the issue. > {code} > import org.apache.spark.sql.types._ > import org.apache.spark.sql.Row > var inputSchema = StructType(StructField("key", StringType, nullable=true) :: > StructField("value", IntegerType, nullable=true) :: Nil) > var a = spark.createDataFrame(sc.parallelize((1 to 10)).map(x => > Row(x.toString, 555)), inputSchema) > var b = a.select($"value" * 2 alias "value", $"key") // any transformation > changing column order will show the problem. > a.union(b).show > // in order to make it work, we need to reorder columns > val bCols = a.columns.map(aCol => b(aCol)) > a.union(b.select(bCols:_*)).show > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org