[ https://issues.apache.org/jira/browse/SPARK-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571126#comment-16571126 ]
Apache Spark commented on SPARK-25040: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/22019 > Empty string for double and float types should be nulls in JSON > ---------------------------------------------------------------- > > Key: SPARK-25040 > URL: https://issues.apache.org/jira/browse/SPARK-25040 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.4.0 > Reporter: Hyukjin Kwon > Priority: Minor > > The issue itself seems to be a behaviour change between 1.6 and 2.x for > treating empty string as null or not in double and float. > {code} > {"a":"a1","int":1,"other":4.4} > {"a":"a2","int":"","other":""} > {code} > code : > {code} > val config = new SparkConf().setMaster("local[5]").setAppName("test") > val sc = SparkContext.getOrCreate(config) > val sql = new SQLContext(sc) > val file_path = > this.getClass.getClassLoader.getResource("Sanity4.json").getFile > val df = sql.read.schema(null).json(file_path) > df.show(30) > {code} > then in spark 1.6, result is > {code} > +---+----+-----+ > | a| int|other| > +---+----+-----+ > | a1| 1| 4.4| > | a2|null| null| > +---+----+-----+ > {code} > {code} > root > |-- a: string (nullable = true) > |-- int: long (nullable = true) > |-- other: double (nullable = true) > {code} > but in spark 2.2, result is > {code} > +----+----+-----+ > | a| int|other| > +----+----+-----+ > | a1| 1| 4.4| > |null|null| null| > +----+----+-----+ > {code} > {code} > root > |-- a: string (nullable = true) > |-- int: long (nullable = true) > |-- other: double (nullable = true) > {code} > Another easy reproducer: > {code} > spark.read.schema("a DOUBLE, b FLOAT") > .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": > 1.1, "b": 1.1}""").toDS) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org