[
https://issues.apache.org/jira/browse/SPARK-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237690#comment-15237690
]
Cheng Lian commented on SPARK-14566:
------------------------------------
This bug is exposed after fixing SPARK-14458.
These two bugs together happened to cheated all our existing test cases.
> When appending to partitioned persisted table, we should apply a projection
> over input query plan using existing metastore schema
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-14566
> URL: https://issues.apache.org/jira/browse/SPARK-14566
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Cheng Lian
> Assignee: Cheng Lian
>
> Take the following snippets slightly modified from test case
> "SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:
> {code}
> val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
> df1.write.partitionBy("i").saveAsTable("tbl11453")
> val df2 = Seq("3" -> "30").toDF("i", "j")
> df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
> {code}
> Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted
> table {{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a
> partition column, which is always appended after all data columns. Thus, when
> appending {{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are
> actually different.
> In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply
> applies existing metastore schema to the input query plan ([see
> here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
> which is wrong. A projection should be used instead to adjust column order
> here.
> In branch-1.6, [this projection is added in
> {{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
> but was removed in Spark 2.0. Replacing the aforementioned line in
> {{CreateMetastoreDataSourceAsSelect}} with a projection should more
> preferrable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]