[ https://issues.apache.org/jira/browse/SPARK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229015#comment-17229015 ]
Takeshi Yamamuro commented on SPARK-33401: ------------------------------------------ I think this is not a bug but an expected behaviour because an user-defined type is not compatible with its internal data type. > Vector type column is not possible to create using spark SQL > ------------------------------------------------------------ > > Key: SPARK-33401 > URL: https://issues.apache.org/jira/browse/SPARK-33401 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.0.1 > Reporter: Pavlo Borshchenko > Priority: Major > > > Created table with vector type column: > {code:java} > import org.apache.spark.mllib.linalg.Vector > import org.apache.spark.mllib.linalg.VectorUDT > import org.apache.spark.mllib.linalg.Vectors > case class Test(features: Vector) > Seq(Test(Vectors.dense(Array(1d, 2d, 3d)))).toDF() > .write > .mode("overwrite") > .saveAsTable("pborshchenko.test_vector_spark_0911_1") > {code} > > Show the create table statement for this created table: > {code:java} > spark.sql("SHOW CREATE TABLE pborshchenko.test_vector_spark_0911_1"){code} > Got: > {code:java} > CREATE TABLE `pborshchenko`.`test_vector_spark_0911_1` ( > `features` STRUCT<`type`: TINYINT, `size`: INT, `indices`: ARRAY<INT>, > `values`: ARRAY<DOUBLE>>) > USING parquet{code} > Create the same table with index 2 at the end: > {code:java} > spark.sql("CREATE TABLE `pborshchenko`.`test_vector_spark_0911_2` > (\n`features` STRUCT<`type`: TINYINT, `size`: INT, `indices`: ARRAY<INT>, > `values`: ARRAY<DOUBLE>>)\nUSING parquet"){code} > Try to insert new values to the table created from SQL: > > {code:java} > import org.apache.spark.mllib.linalg.Vector > import org.apache.spark.mllib.linalg.VectorUDT > import org.apache.spark.mllib.linalg.Vectors > case class Test(features: Vector) > Seq(Test(Vectors.dense(Array(1d, 2d, 3d)))).toDF() > .write > .mode(SaveMode.Append) > .insertInto("pborshchenko.test_vector_spark_0911_2") > {code} > > Got: > > {code:java} > AnalysisException: Cannot write incompatible data to table > '`pborshchenko`.`test_vector_spark_0911_2`': - Cannot write 'features': > struct<type:tinyint,size:int,indices:array<int>,values:array<double>> is > incompatible with > struct<type:tinyint,size:int,indices:array<int>,values:array<double>>; - > Cannot write 'features': > struct<type:tinyint,size:int,indices:array<int>,values:array<double>> is > incompatible with > struct<type:tinyint,size:int,indices:array<int>,values:array<double>>; at > org.apache.spark.sql.catalyst.analysis.TableOutputResolver$.resolveOutputColumns(TableOutputResolver.scala:72) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:467) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:494) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:486) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:112) > {code} > > The reason that table created from spark SQL has the type STRUCT, not vector, > but this struct is the right representation for vector type. > AC: Should be possible to create a table using spark SQL with vector type > column and after that write to it without any errors. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org