[ 
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356455#comment-16356455
 ] 

Aihua Xu commented on HIVE-18421:
---------------------------------

[~vihangk1] Sorry for the late reply. I left comment in RB. Basically I don't 
follow why we need both CHECKED and UNCHECKED implementations. Seems we should 
only have CHECKED one if UNCHECKED one would generate incorrect result. The 
user would get incorrect result without notice, right?

Of course, even we want to support UNCHECKED implementation, we should error 
out/fail the query if there is overflow so the user knows to set the flag to 
true. BTW: how much performance impact for this and why (don't exactly follow 
previous discussion)?

> Vectorized execution handles overflows in a different manner than 
> non-vectorized execution
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-18421.01.patch, HIVE-18421.02.patch, 
> HIVE-18421.03.patch, HIVE-18421.04.patch, HIVE-18421.05.patch, 
> HIVE-18421.06.patch, HIVE-18421.07.patch
>
>
> In vectorized execution arithmetic operations which cause integer overflows 
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by 
> diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to