[ https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420809#comment-16420809 ]
Michael Armbrust commented on SPARK-23835: ------------------------------------------ /cc [~cloud_fan] > When Dataset.as converts column from nullable to non-nullable type, null > Doubles are converted silently to -1 > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-23835 > URL: https://issues.apache.org/jira/browse/SPARK-23835 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Joseph K. Bradley > Priority: Major > > I constructed a DataFrame with a nullable java.lang.Double column (and an > extra Double column). I then converted it to a Dataset using ```as[(Double, > Double)]```. When the Dataset is shown, it has a null. When it is collected > and printed, the null is silently converted to a -1. > Code snippet to reproduce this: > {code} > val localSpark = spark > import localSpark.implicits._ > val df = Seq[(java.lang.Double, Double)]( > (1.0, 2.0), > (3.0, 4.0), > (Double.NaN, 5.0), > (null, 6.0) > ).toDF("a", "b") > df.show() // OUTPUT 1: has null > df.printSchema() > val data = df.as[(Double, Double)] > data.show() // OUTPUT 2: has null > data.collect().foreach(println) // OUTPUT 3: has -1 > {code} > OUTPUT 1 and 2: > {code} > +----+---+ > | a| b| > +----+---+ > | 1.0|2.0| > | 3.0|4.0| > | NaN|5.0| > |null|6.0| > +----+---+ > {code} > OUTPUT 3: > {code} > (1.0,2.0) > (3.0,4.0) > (NaN,5.0) > (-1.0,6.0) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org