[jira] [Updated] (SPARK-39885) Behavior differs between arrays_overlap and array_contains for negative 0.0

David Vogelbacher (Jira) Wed, 27 Jul 2022 09:10:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Vogelbacher updated SPARK-39885:
--------------------------------------
    Summary: Behavior differs between arrays_overlap and array_contains for 
negative 0.0  (was: Behavior differs between array_overlap and array_contains 
for negative 0.0)

> Behavior differs between arrays_overlap and array_contains for negative 0.0
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-39885
>                 URL: https://issues.apache.org/jira/browse/SPARK-39885
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.2
>            Reporter: David Vogelbacher
>            Priority: Major
>
> {{array_contains([0.0], -0.0)}} will return true. {{array_overlaps([0.0], 
> [-0.0])}} will return false. I think we generally want to treat -0.0 and 0.0 
> as the same (see 
> https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SQLOrderingUtil.scala#L28)
> However, the {{Double::equals}} method doesn't. Therefore, we should either 
> mark double as false in 
> [TypeUtils#typeWithProperEquals|https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala#L96],
>  or we should wrap it with our own equals method that handles this case.
> Java code snippets showing the issue:
> {code:java}
> dataset = sparkSession.createDataFrame(
>             List.of(RowFactory.create(List.of(-0.0))),
>             
> DataTypes.createStructType(ImmutableList.of(DataTypes.createStructField(
>                     "doubleCol", 
> DataTypes.createArrayType(DataTypes.DoubleType), false))));
>         Dataset<Row> df = dataset.withColumn(
>             "overlaps", 
> functions.arrays_overlap(functions.array(functions.lit(+0.0)), 
> dataset.col("doubleCol")));
>         List<Row> result = df.collectAsList(); // [[WrappedArray(-0.0),false]]
> {code}
> {code:java}
> dataset = sparkSession.createDataFrame(
>                 List.of(RowFactory.create(-0.0)),
>                 DataTypes.createStructType(
>                         
> ImmutableList.of(DataTypes.createStructField("doubleCol", 
> DataTypes.DoubleType, false))));
>         Dataset<Row> df = dataset.withColumn(
>                 "contains", 
> functions.array_contains(functions.array(functions.lit(+0.0)), 
> dataset.col("doubleCol")));
>         List<Row> result = df.collectAsList(); // [[-0.0,true]]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39885) Behavior differs between arrays_overlap and array_contains for negative 0.0

Reply via email to