Cheng Lian created SPARK-14566:
----------------------------------

             Summary: When appending to partitioned persisted table, we should 
apply a projection over input query plan using existing metastore schema
                 Key: SPARK-14566
                 URL: https://issues.apache.org/jira/browse/SPARK-14566
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Cheng Lian
            Assignee: Cheng Lian


Take the following snippets slightly modified from test case 
"SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:

{code}
val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
df1.write.partitionBy("i").saveAsTable("tbl11453")

val df2 = Seq("3" -> "30").toDF("i", "j")
df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
{code}

Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted table 
{{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a partition 
column, which is always appended after all data columns. Thus, when appending 
{{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are actually 
different.

In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply applies 
existing metastore schema to the input query plan ([see 
here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
 which is wrong. A projection should be used instead to adjust column order 
here.

In branch-1.6, [this projection is added in 
{{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
 but was removed in Spark 2.0. Replacing the aforementioned line in 
{{CreateMetastoreDataSourceAsSelect}} with a projection should more preferrable.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to