[jira] [Commented] (SPARK-33401) Vector type column is not possible to create using spark SQL

Takeshi Yamamuro (Jira) Mon, 09 Nov 2020 22:51:45 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229015#comment-17229015
 ]


Takeshi Yamamuro commented on SPARK-33401:
------------------------------------------

I think this is not a bug but an expected behaviour because an user-defined 
type is not compatible with its internal data type. 

> Vector type column is not possible to create using spark SQL
> ------------------------------------------------------------
>
>                 Key: SPARK-33401
>                 URL: https://issues.apache.org/jira/browse/SPARK-33401
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Pavlo Borshchenko
>            Priority: Major
>
>  
> Created table with vector type column:
> {code:java}
> import org.apache.spark.mllib.linalg.Vector
> import org.apache.spark.mllib.linalg.VectorUDT
> import org.apache.spark.mllib.linalg.Vectors
> case class Test(features: Vector) 
> Seq(Test(Vectors.dense(Array(1d, 2d, 3d)))).toDF()
>  .write
>  .mode("overwrite")
>  .saveAsTable("pborshchenko.test_vector_spark_0911_1")
> {code}
>  
> Show the create table statement for this created table:
> {code:java}
> spark.sql("SHOW CREATE TABLE pborshchenko.test_vector_spark_0911_1"){code}
> Got:
> {code:java}
> CREATE TABLE `pborshchenko`.`test_vector_spark_0911_1` (
>  `features` STRUCT<`type`: TINYINT, `size`: INT, `indices`: ARRAY<INT>, 
> `values`: ARRAY<DOUBLE>>)
> USING parquet{code}
> Create the same table with index 2 at the end:
> {code:java}
> spark.sql("CREATE TABLE `pborshchenko`.`test_vector_spark_0911_2` 
> (\n`features` STRUCT<`type`: TINYINT, `size`: INT, `indices`: ARRAY<INT>, 
> `values`: ARRAY<DOUBLE>>)\nUSING parquet"){code}
> Try to insert new values to the table created from SQL:
>  
> {code:java}
> import org.apache.spark.mllib.linalg.Vector
> import org.apache.spark.mllib.linalg.VectorUDT
> import org.apache.spark.mllib.linalg.Vectors
> case class Test(features: Vector)
> Seq(Test(Vectors.dense(Array(1d, 2d, 3d)))).toDF()
>  .write
>  .mode(SaveMode.Append)
>  .insertInto("pborshchenko.test_vector_spark_0911_2")
> {code}
>  
> Got:
>  
> {code:java}
>  AnalysisException: Cannot write incompatible data to table 
> '`pborshchenko`.`test_vector_spark_0911_2`': - Cannot write 'features': 
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>> is 
> incompatible with 
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>>;      - 
> Cannot write 'features': 
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>> is 
> incompatible with 
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>>; at 
> org.apache.spark.sql.catalyst.analysis.TableOutputResolver$.resolveOutputColumns(TableOutputResolver.scala:72)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:467)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:494)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:486)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:112)
>     {code}
>  
> The reason that table created from spark SQL has the type STRUCT, not vector, 
> but this struct is the right representation for vector type.
> AC: Should be possible to create a table using spark SQL with vector type 
> column and after that write to it without any errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33401) Vector type column is not possible to create using spark SQL

Reply via email to