Benedikt Maria Beckermann created SPARK-30767: -------------------------------------------------
Summary: from_json changes times of timestmaps by several minutes without error Key: SPARK-30767 URL: https://issues.apache.org/jira/browse/SPARK-30767 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Environment: We ran the example code with Spark 2.4.4 via Azure Databricks with Databricks Runtime version 6.3 within an interactive cluster. We encountered the issue first on a Job Cluster running a streaming application on Databricks Runtime Version 5.4. Reporter: Benedikt Maria Beckermann When a json text column includes a timestamp and the timestamp has a format like {{2020-01-25T06:39:45.887429Z}}, the function {{from_json(Column,StructType)}} is able to infer a timestamp but that timestamp is changed by several minutes. Spark does not throw any kind of error but continues to run with the invalidated timestamp. The following scala snipped is able to reproduce the issue. {code:scala} import org.apache.spark.sql._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val df = Seq("""{"time":"2020-01-25T06:39:45.887429Z"}""").toDF("json") val struct = new StructType().add("time", TimestampType, nullable = true) val timeDF = df .withColumn("time (string)", get_json_object(col("json"), "$.time")) .withColumn("time casted directly (CORRECT)", col("time (string)").cast(TimestampType)) .withColumn("time casted via struct (INVALID)", from_json(col("json"), struct)) display(timeDF) {code} Output: ||json||time (string)||time casted directly (CORRECT)||time casted via struct (INVALID) |{"time":"2020-01-25T06:39:45.887429Z"}|2020-01-25T06:39:45.887429Z|2020-01-25T06:39:45.887+0000|{"time":"2020-01-25T06:54:32.429+0000"} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org