[
https://issues.apache.org/jira/browse/SPARK-52355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955281#comment-17955281
]
Austin Warner commented on SPARK-52355:
---------------------------------------
I am preparing a PR for this now
> VariantVal schema improperly inferred as struct<metadata:binary,value:binary>
> -----------------------------------------------------------------------------
>
> Key: SPARK-52355
> URL: https://issues.apache.org/jira/browse/SPARK-52355
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 4.0.0
> Reporter: Austin Warner
> Priority: Minor
>
> When creating VariantVal objects locally in Python, the schema is improperly
> inferred as a struct with metadata and value fields.
>
> {quote}{{>>> from pyspark.sql.types import VariantVal}}
> {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)],
> schema=['value'])}}
> {{>>> df.printSchema()}}
> {{root}}
> {{|-- value: struct (nullable = true)}}
> {{| |-- metadata: binary (nullable = true)}}
> {{| |-- value: binary (nullable = true)}}
> {{>>> df.collect()}}
> {{[Row(value=Row(metadata=bytearray(b'\x01\x00\x00'),
> value=bytearray(b'\x03\x01\x00\x02\x0c\x01')))]}}
> {quote}
> When the schema is passed explicitly, everything works as intended
> {quote}{{>>> from pyspark.sql.types import VariantVal}}
> {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)],
> schema='value variant')}}
> {{>>> df.printSchema()}}
> {{root}}
> {{|-- value: variant (nullable = true)}}
> {{>>> df.collect()}}
> {{[Row(value=VariantVal(bytearray(b'\x03\x01\x00\x02\x0c\x01'),
> bytearray(b'\x01\x00\x00')))]}}
> {{>>> df.collect()[0].value.toJson()}}
> {{'[1]'}}
> {quote}
> This appears to be because the
> [{{pyspark.sql.type._infer_schema}}|https://github.com/apache/spark/blob/e3321aa44ea255365222c491657b709ef41dc460/python/pyspark/sql/types.py#L2325-L2380]
> function does not include a case for VariantVal objects
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]