[ 
https://issues.apache.org/jira/browse/SPARK-31772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31772.
----------------------------------
    Resolution: Not A Problem

It works by design. The last JSON does not match with the type and it fails to 
parse. You can use string types and manually cast later.

> Json schema reading is not consistent between int and string types
> ------------------------------------------------------------------
>
>                 Key: SPARK-31772
>                 URL: https://issues.apache.org/jira/browse/SPARK-31772
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.4
>            Reporter: yaniv oren
>            Priority: Major
>
> When reading json file using a schema, int value is converted to string if 
> field is string but string field is not converted to int value if field is 
> int.
> Sample Code:
> read_schema = StructType([StructField({color:#008080}"a"{color}, 
> IntegerType()),
>  StructField({color:#008080}"b"{color}, StringType())])
> df = 
> {color:#94558d}self{color}.spark_session.read.schema(read_schema).json({color:#008080}"input/json/temp_test"{color})
> df.show()
>  
> json temp_test
> {"a": 1,"b": "b1"}
> {"a": 2,"b": "b2"}
> {"a": 3,"b": 3}
> {"a": "4","b": 4}
>  
> actual:
> | a| b|
> +----+----+
> | 1| b1|
> | 2| b2|
> | 3| 3|
> |null|null|
> +----+----+
>  
> expected:
> Third line will be nulled as the fourth line as b is int while in schema it's 
> string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to