[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-32110: ------------------------------------ Assignee: (was: Apache Spark) > -0.0 vs 0.0 is inconsistent > --------------------------- > > Key: SPARK-32110 > URL: https://issues.apache.org/jira/browse/SPARK-32110 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Robert Joseph Evans > Priority: Major > > This is related to SPARK-26021 where some things were fixed but there is > still a lot that is not consistent. > When parsing SQL {{-0.0}} is turned into {{0.0}}. This can produce quick > results that appear to be correct but are totally inconsistent for the same > operators. > {code:java} > scala> import spark.implicits._ > import spark.implicits._ > scala> spark.sql("SELECT 0.0 = -0.0").collect > res0: Array[org.apache.spark.sql.Row] = Array([true]) > scala> Seq((0.0, -0.0)).toDF("a", "b").selectExpr("a = b").collect > res1: Array[org.apache.spark.sql.Row] = Array([false]) > {code} > This also shows up in sorts > {code:java} > scala> Seq((0.0, -100.0), (-0.0, 100.0), (0.0, 100.0), (-0.0, > -100.0)).toDF("a", "b").orderBy("a", "b").collect > res2: Array[org.apache.spark.sql.Row] = Array([-0.0,-100.0], [-0.0,100.0], > [0.0,-100.0], [0.0,100.0]) > {code} > But not for a equi-join or for an aggregate > {code:java} > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a").collect > res3: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0]) > scala> Seq((0.0, 1.0), (-0.0, 1.0)).toDF("a", "b").groupBy("a").count.collect > res6: Array[org.apache.spark.sql.Row] = Array([0.0,2]) > {code} > This can lead to some very odd results. Like an equi-join with a filter that > logically should do nothing, but ends up filtering the result to nothing. > {code:java} > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a" && $"a" <= $"r_a").collect > res8: Array[org.apache.spark.sql.Row] = Array() > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a").collect > res9: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0]) > {code} > Hive never normalizes -0.0 to 0.0 so this results in non-ieee complaint > behavior everywhere, but at least it is consistently odd. > MySQL, Oracle, Postgres, and SQLite all appear to normalize the {{-0.0}} to > {{0.0}}. > The root cause of this appears to be that the java implementation of > {{Double.compare}} and {{Float.compare}} for open JDK places {{-0.0}} < > {{0.0}}. > This is not documented in the java docs but it is clearly documented in the > code, so it is not a "bug" that java is going to fix. > [https://github.com/openjdk/jdk/blob/a0a0539b0d3f9b6809c9759e697bfafd7b138ec1/src/java.base/share/classes/java/lang/Double.java#L1022-L1035] > It is also consistent with what is in the java docs for {{Double.equals}} > > [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#equals-java.lang.Object-] > To be clear I am filing this mostly to document the current state rather than > to think it needs to be fixed ASAP. It is a rare corner case, but ended up > being really frustrating for me to debug what was happening. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org