Daniel Darabos created SPARK-25146:
--------------------------------------

             Summary: avg() returns null on some decimals
                 Key: SPARK-25146
                 URL: https://issues.apache.org/jira/browse/SPARK-25146
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.1, 2.3.0
            Reporter: Daniel Darabos


We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average 
them. The average in some cases comes out to {{null}} to our surprise (and 
disappointment).

After a bit of digging it looks like these numbers have ended up with the 
{{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with 
this type:

{{scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")}}

{{scala> spark.sql("select cast(value as decimal(37, 30)) as v from 
x").createOrReplaceTempView("x")}}

{{scala> spark.sql("select avg(v) from x").show}}

{{+------+}}
{{|avg(v)|}}
{{+------+}}
{{|  null|}}
{{+------+}}

For up to 4471 numbers it is able to calculate the average. For 4472 or more 
numbers it's {{null}}.

Now I'll just change these numbers to {{double}}. But we got the types entirely 
automatically. We never asked for {{decimal}}. If this is the default type, 
it's important to support averaging a handful of them. (Sorry for the 
bitterness. I like {{double}} more. :))

Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise 
that {{avg()}} fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to