[
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841442#comment-13841442
]
Eric Hanson commented on HIVE-5878:
-----------------------------------
I'm not comfortable with this change. If the data type of expression and
aggregate results in Hive was reasonable before, I think it would be best to
leave the type the same as it was, for backward compatibility, so we don't
break people's applications.
Also, the vectorized implementation of aggregates was built to return the same
data types as the row-at-a-time implementation, for compatibility. It is
important that any changes in semantics or types be implemented in both the
row-at-a-time, and vectorized execution paths.
Different database systems make different choices about expression and
aggregate result types. For example, in SQL Server, avg applied to an int is an
int:
{code}
create table test(i int);
select avg(i) avg_i into res2 from test;
{code}
gives table res2 with this schema:
{code}
CREATE TABLE [dbo].[res2](
[avg_i] [int] NULL
)
{code}
> Hive standard avg UDAF returns double as the return type for some exact input
> types
> -----------------------------------------------------------------------------------
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
> Issue Type: Bug
> Components: Types, UDF
> Affects Versions: 0.12.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;
> ...
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: avg(VALUE._col0)
> bucketGroup: false
> mode: mergepartial
> outputColumnNames: _col0
> Select Operator
> expressions:
> expr: _col0
> type: double
> {code}
> However, exact types including integers and decimal should yield exact type.
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +-------+--------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +-------+--------------+------+-----+---------+-------+
> | i | int(11) | YES | | NULL | |
> | b | tinyint(1) | YES | | NULL | |
> | d | double | YES | | NULL | |
> | s | varchar(5) | YES | | NULL | |
> | dd | decimal(5,2) | YES | | NULL | |
> +-------+--------------+------+-----+---------+-------+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +-------+---------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +-------+---------------+------+-----+---------+-------+
> | avg(i) | decimal(14,4) | YES | | NULL | |
> +-------+---------------+------+-----+---------+-------+
> 1 row in set (0.00 sec)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)