[
https://issues.apache.org/jira/browse/SPARK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605335#comment-16605335
]
Apache Spark commented on SPARK-25313:
--------------------------------------
User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/22346
> Fix regression in FileFormatWriter output schema
> ------------------------------------------------
>
> Key: SPARK-25313
> URL: https://issues.apache.org/jira/browse/SPARK-25313
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Gengliang Wang
> Assignee: Gengliang Wang
> Priority: Major
> Fix For: 2.4.0
>
>
> In the follow example:
> val location = "/tmp/t"
> val df = spark.range(10).toDF("id")
> df.write.format("parquet").saveAsTable("tbl")
> spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
> spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location
> $location")
> spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
> println(spark.read.parquet(location).schema)
> spark.table("tbl2").show()
> The output column name in schema will be id instead of ID, thus the last
> query shows nothing from tbl2.
> By enabling the debug message we can see that the output naming is changed
> from ID to id, and then the outputColumns in
> InsertIntoHadoopFsRelationCommand is changed in RemoveRedundantAliases.
> To guarantee correctness, we should change the output columns from
> `Seq[Attribute]` to `Seq[String]` to avoid its names being replaced by
> optimizer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]