Abdeali Kothari created SPARK-35741:
---------------------------------------

             Summary: Variance of 1 record gives NULL in Spark 3.x and NaN in 
Spark 2.x
                 Key: SPARK-35741
                 URL: https://issues.apache.org/jira/browse/SPARK-35741
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.1.1
            Reporter: Abdeali Kothari


I have a few testcases in my suite which started failing after moving to Spark 3

And I noticed that the reason was that VARIANCE() function was earlier 
returning NaN when it was run on 1 record. Now, it gives NULL.

 
{code:java}
export SPARK_HOME=/usr/local/hadoop/spark-2.4.6-bin-hadoop2.7/
python
>>> import pyspark
>>> spark = pyspark.sql.SparkSession.builder.getOrCreate()
>>> spark.sql('SELECT  VARIANCE(1)').show()
+---------------------------+
|var_samp(CAST(1 AS DOUBLE))|
+---------------------------+
|                        NaN|
+---------------------------+{code}
With spark 3:
{code:java}
export SPARK_HOME=/usr/local/hadoop/spark-3.1.1-bin-hadoop2.7/
python
>>> import pyspark
>>> spark = pyspark.sql.SparkSession.builder.getOrCreate()
>>> spark.sql('SELECT VARIANCE(1)').show()
+---------------------------+
|variance(CAST(1 AS DOUBLE))|
+---------------------------+
|                       null|
+---------------------------+
{code}
 

Just thought I'd report it here as I didn't see it in any of the release notes

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to