Austin Warner created SPARK-52355:
-------------------------------------

             Summary: VariantVal schema improperly inferred as 
struct<metadata:binary,value:binary>
                 Key: SPARK-52355
                 URL: https://issues.apache.org/jira/browse/SPARK-52355
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 4.0.0
            Reporter: Austin Warner


When creating VariantVal objects locally in Python, the schema is improperly 
inferred as a struct with metadata and value fields.
 
{quote}{{>>> from pyspark.sql.types import VariantVal}}
{{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], 
schema=['value'])}}
{{>>> df.printSchema()}}
{{root}}
{{|-- value: struct (nullable = true)}}
{{| |-- metadata: binary (nullable = true)}}
{{| |-- value: binary (nullable = true)}}
{{>>> df.collect()}}
{{[Row(value=Row(metadata=bytearray(b'\x01\x00\x00'), 
value=bytearray(b'\x03\x01\x00\x02\x0c\x01')))]}}
{quote}
When the schema is passed explicitly, everything works as intended
{quote}{{>>> from pyspark.sql.types import VariantVal}}
{{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], 
schema='value variant')}}
{{>>> df.printSchema()}}
{{root}}
{{|-- value: variant (nullable = true)}}
{{>>> df.collect()}}
{{[Row(value=VariantVal(bytearray(b'\x03\x01\x00\x02\x0c\x01'), 
bytearray(b'\x01\x00\x00')))]}}
{{>>> df.collect()[0].value.toJson()}}
{{'[1]'}}
{quote}
This appears to be because the 
[{{pyspark.sql.type._infer_schema}}|https://github.com/apache/spark/blob/e3321aa44ea255365222c491657b709ef41dc460/python/pyspark/sql/types.py#L2325-L2380]
 function does not include a case for VariantVal objects



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to