Cheng Lian created SPARK-14566:
----------------------------------
Summary: When appending to partitioned persisted table, we should
apply a projection over input query plan using existing metastore schema
Key: SPARK-14566
URL: https://issues.apache.org/jira/browse/SPARK-14566
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Take the following snippets slightly modified from test case
"SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:
{code}
val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
df1.write.partitionBy("i").saveAsTable("tbl11453")
val df2 = Seq("3" -> "30").toDF("i", "j")
df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
{code}
Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted table
{{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a partition
column, which is always appended after all data columns. Thus, when appending
{{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are actually
different.
In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply applies
existing metastore schema to the input query plan ([see
here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
which is wrong. A projection should be used instead to adjust column order
here.
In branch-1.6, [this projection is added in
{{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
but was removed in Spark 2.0. Replacing the aforementioned line in
{{CreateMetastoreDataSourceAsSelect}} with a projection should more preferrable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]