[ 
https://issues.apache.org/jira/browse/SPARK-51576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51576.
---------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 50343
[https://github.com/apache/spark/pull/50343]

> We should not be able to cast variants to non-nullable types in ANSI mode
> -------------------------------------------------------------------------
>
>                 Key: SPARK-51576
>                 URL: https://issues.apache.org/jira/browse/SPARK-51576
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Harsh Motwani
>            Assignee: Harsh Motwani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Variant is the only data type which in ANSI mode can cast from a non-null 
> value to a null value in a target data type. Particularly when casting 
> variants to strings, variant nulls get cast to SQL nulls i.e. 
> `parse_json('null')::string` gives `null` even though the source 
> (`parse_json('null')`) is not a null value.
> Currently, it is legal to cast from `array<variant, containsNull = false>` to 
> `array<string, containsNull = true>`. However, this should not be legal since 
> `array(parse_json('null'))` would give `array(null)` after this cast. The 
> data type would think that the array does not contain nulls when it in fact 
> does contain nulls.
> This is a demonstration of this problem. Here, we create an array(stringtype, 
> containsNull = false) which in fact does contain a null.
> ```
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.types import StringType, VariantType, ArrayType
> >>> 
> >>> df = spark.sql("select array(parse_json('null')) arr")
> >>> df.printSchema()
> root
>  |-- arr: array (nullable = false)
>  |    |-- element: variant (containsNull = false)
> >>> df2 = df.select(col('arr').cast(ArrayType(StringType(), False)))
> >>> df2.selectExpr("arr[0] is null").show()
> +----------------+
> |(arr[0] IS NULL)|
> +----------------+
> |            true|
> +----------------+
> >>> df2.printSchema()
> root
>  |-- arr: array (nullable = false)
>  |    |-- element: string (containsNull = false)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to