[jira] [Updated] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.

Hyukjin Kwon (Jira) Tue, 01 Nov 2022 18:23:14 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-40988:
---------------------------------
    Fix Version/s:     (was: 3.4.0)

> Spark3 partition column value is not validated with user provided schema.
> -------------------------------------------------------------------------
>
>                 Key: SPARK-40988
>                 URL: https://issues.apache.org/jira/browse/SPARK-40988
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>            Reporter: Ranga Reddy
>            Priority: Major
>
> Spark3 has not validated the Partition Column type while inserting the data 
> but on the Hive side exception is thrown while inserting different type 
> values.
> *Spark Code:*
>  
> {code:java}
> scala> val tableName="test_partition_table"
> tableName: String = test_partition_table
> scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) 
> PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("SHOW tables").show(truncate=false)
> +---------+---------------------+-----------+
> |namespace|tableName            |isTemporary|
> +---------+---------------------+-----------+
> |default  |test_partition_table |false      |
> +---------+---------------------+-----------+
> scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, 
> false)
> +------------------------------------------+-----+
> |key                                       |value|
> +------------------------------------------+-----+
> |spark.sql.sources.validatePartitionColumns|true |
> +------------------------------------------+-----+
> scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, 
> 'Ranga')""")
> res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
> $tableName").show(50, false)
> +---------+
> |partition|
> +---------+
> |age=25   |
> +---------+
> scala> spark.sql(s"select * from $tableName").show(50, false)
> +---+-----+---+
> |id |name |age|
> +---+-----+---+
> |1  |Ranga|25 |
> +---+-----+---+
> scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") 
> VALUES (2, 'Nishanth')""")
> res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
> $tableName").show(50, false)
> +------------+
> |partition   |
> +------------+
> |age=25      |
> |age=test_age|
> +------------+
> scala> spark.sql(s"select * from $tableName").show(50, false)
> +---+--------+----+
> |id |name    |age |
> +---+--------+----+
> |1  |Ranga   |25  |
> |2  |Nishanth|null|
> +---+--------+----+ {code}
> *Hive Code:*
>  
>  
> {code:java}
> > INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, 
> > 'Nishanth');
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10248]: Cannot add partition column age of type string as it cannot be 
> converted to type int (state=42000,code=10248){code}
>  
> *Expected Result:*
> When *spark.sql.sources.validatePartitionColumns=true* it needs to be 
> validated the datatype value and exception needs to be thrown if we provide 
> wrong data type value.
> *Reference:*
> [https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.

Reply via email to