[
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321044#comment-16321044
]
Gopal V commented on HIVE-18421:
--------------------------------
bq. One option might be to generate 2 sets of vectorization classes: checked
and unchecked.
That looks like a good option - the checking problem is that the same code is
used for bigint and tinyint. Bigint doesn't need ovf and tinyint does.
The vectorcodegen means that this is actually possible to do via a template
variable rather than writing all code from scratch.
bq. And, it isn't just +/-, but it is also the sum and avg aggregations, etc.
sum(tinyint) -> bigint - that has no overflow case there (which is different
from the non-vec).
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java#L122
> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
> Key: HIVE-18421
> URL: https://issues.apache.org/jira/browse/HIVE-18421
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
>
> In vectorized execution arithmetic operations which cause integer overflows
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by
> diff desc;
> +-------+-----+-------+
> | t1 | t2 | diff |
> +-------+-----+-------+
> | -104 | 25 | 127 |
> | -112 | 24 | 120 |
> | 54 | 9 | 45 |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)