[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843972#comment-13843972
 ] 

Xuefu Zhang commented on HIVE-5996:
-----------------------------------

For information, the follow is the text from SQL-92[1] standard w.r.t SUM 
function:
{quote}
            b) If SUM is specified and DT is exact numeric with scale
              S, then the data type of the result is exact numeric with
              implementation-defined precision and scale S.
{quote}
For DT as long, currently Hive returns long, which doesn't violate the 
standard. However, such implementation is problematic as demonstrated in this 
JIRA. Plus, for decimal, Hive sum function accommodates at least 10 billion 
rows of data. Letting sum(long) return long is not able to uphold that 
assurance. Thus, we need to change the return type to make the function more 
useful.

[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt


> Query for sum of a long column of a table with only two rows produces wrong 
> result
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-5996
>                 URL: https://issues.apache.org/jira/browse/HIVE-5996
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.12.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>
> {code}
> hive> desc test2;
> OK
> l                     bigint                  None                
> hive> select * from test2;                                 
> OK
> 6666666666666666666
> 5555555555555555555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it 
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +----------------------+
> | sum(l)               |
> +----------------------+
> | 12222222222222222221 |
> +----------------------+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows 
> is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to