[
https://issues.apache.org/jira/browse/SPARK-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-26707:
------------------------------------
Assignee: Apache Spark
> Insert into table with single struct column fails
> -------------------------------------------------
>
> Key: SPARK-26707
> URL: https://issues.apache.org/jira/browse/SPARK-26707
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.3, 2.3.2, 2.4.0, 3.0.0
> Reporter: Bruce Robbins
> Assignee: Apache Spark
> Priority: Minor
>
> This works:
> {noformat}
> scala> sql("select named_struct('d1', 123) c1, 12
> c2").write.format("parquet").saveAsTable("structtbl2")
> scala> sql("show create table structtbl2").show(truncate=false)
> +---------------------------------------------------------------------------+
> |createtab_stmt |
> +---------------------------------------------------------------------------+
> |CREATE TABLE `structtbl2` (`c1` STRUCT<`d1`: INT>, `c2` INT)
> USING parquet
> |
> +---------------------------------------------------------------------------+
> scala> sql("insert into structtbl2 values (struct(789), 17)")
> res2: org.apache.spark.sql.DataFrame = []
> scala> sql("select * from structtbl2").show
> +-----+---+
> | c1| c2|
> +-----+---+
> |[789]| 17|
> |[123]| 12|
> +-----+---+
> scala>
> {noformat}
> However, if the table's only column is the struct column, the insert does not
> work:
> {noformat}
> scala> sql("select named_struct('d1', 123)
> c1").write.format("parquet").saveAsTable("structtbl1")
> scala> sql("show create table structtbl1").show(truncate=false)
> +-----------------------------------------------------------------+
> |createtab_stmt |
> +-----------------------------------------------------------------+
> |CREATE TABLE `structtbl1` (`c1` STRUCT<`d1`: INT>)
> USING parquet
> |
> +-----------------------------------------------------------------+
> scala> sql("insert into structtbl1 values (struct(789))")
> org.apache.spark.sql.AnalysisException: cannot resolve '`col1`' due to data
> type mismatch: cannot cast int to struct<d1:int>;;
> 'InsertIntoHadoopFsRelationCommand
> file:/Users/brobbins/github/spark_upstream/spark-warehouse/structtbl1, false,
> Parquet, Map(path ->
> file:/Users/brobbins/github/spark_upstream/spark-warehouse/structtbl1),
> Append, CatalogTable(
> ...etc...
> {noformat}
> I can work around it by using a named_struct as the value:
> {noformat}
> scala> sql("insert into structtbl1 values (named_struct('d1',789))")
> res7: org.apache.spark.sql.DataFrame = []
> scala> sql("select * from structtbl1").show
> +-----+
> | c1|
> +-----+
> |[789]|
> |[123]|
> +-----+
> scala>
> {noformat}
> My guess is that I just don't understand how structs work. But maybe this is
> a bug.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]