[ 
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321327#comment-16321327
 ] 

Matt McCline commented on HIVE-18421:
-------------------------------------

Yes, like you said, 2 different sets of LongColAddLongColumn[Checked] classes 
controlled by an env variable.  An optional attribute could be added to 
VectorExpressionDescriptor to mark a class as checked.

You could add class sub-variations for checked for byte, short, and int that 
still use LongColumnVector.  Adding {Byte|Short|Int}ColumnVector is difficult 
because of all the other things like comparision, aggregation etc, that would 
explode in the number of classes.  And, I think Gopal would point out that 
longs have the best performance even though there are wasted bits.  And, 
further, adding new integer column types would impact ORC, Parquet, etc, and 
large swathes of other code (e.g. GroupBy).  I don't believe it would be worth 
the extensive plumbing cost.

> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>
> In vectorized execution arithmetic operations which cause integer overflows 
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by 
> diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to