yaniv oren created SPARK-31772: ---------------------------------- Summary: Json schema reading is not consistent between int and string types Key: SPARK-31772 URL: https://issues.apache.org/jira/browse/SPARK-31772 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.4 Reporter: yaniv oren
When reading json file using a schema, int value is converted to string if field is string but string field is not converted to int value if field is int. Sample Code: read_schema = StructType([StructField({color:#008080}"a"{color}, IntegerType()), StructField({color:#008080}"b"{color}, StringType())]) df = {color:#94558d}self{color}.spark_session.read.schema(read_schema).json({color:#008080}"input/json/temp_test"{color}) df.show() json temp_test {"a": 1,"b": "b1"} {"a": 2,"b": "b2"} {"a": 3,"b": 3} {"a": "4","b": 4} actual: | a| b| +----+----+ | 1| b1| | 2| b2| | 3| 3| |null|null| +----+----+ expected: Third line will be nulled as the fourth line as b is int while in schema it's string. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org