[
https://issues.apache.org/jira/browse/SPARK-51576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-51576.
---------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 50343
[https://github.com/apache/spark/pull/50343]
> We should not be able to cast variants to non-nullable types in ANSI mode
> -------------------------------------------------------------------------
>
> Key: SPARK-51576
> URL: https://issues.apache.org/jira/browse/SPARK-51576
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Harsh Motwani
> Assignee: Harsh Motwani
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Variant is the only data type which in ANSI mode can cast from a non-null
> value to a null value in a target data type. Particularly when casting
> variants to strings, variant nulls get cast to SQL nulls i.e.
> `parse_json('null')::string` gives `null` even though the source
> (`parse_json('null')`) is not a null value.
> Currently, it is legal to cast from `array<variant, containsNull = false>` to
> `array<string, containsNull = true>`. However, this should not be legal since
> `array(parse_json('null'))` would give `array(null)` after this cast. The
> data type would think that the array does not contain nulls when it in fact
> does contain nulls.
> This is a demonstration of this problem. Here, we create an array(stringtype,
> containsNull = false) which in fact does contain a null.
> ```
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.types import StringType, VariantType, ArrayType
> >>>
> >>> df = spark.sql("select array(parse_json('null')) arr")
> >>> df.printSchema()
> root
> |-- arr: array (nullable = false)
> | |-- element: variant (containsNull = false)
> >>> df2 = df.select(col('arr').cast(ArrayType(StringType(), False)))
> >>> df2.selectExpr("arr[0] is null").show()
> +----------------+
> |(arr[0] IS NULL)|
> +----------------+
> | true|
> +----------------+
> >>> df2.printSchema()
> root
> |-- arr: array (nullable = false)
> | |-- element: string (containsNull = false)
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]