[
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320926#comment-16320926
]
Matt McCline commented on HIVE-18421:
-------------------------------------
Well, I hear you but given how crucial performance it isn't that simple. Since
Java does not have built in support for detecting underflow/overflow (e.g.
$OVERFLOW), you would end up adding if stmts (a Google search will show you
some) with each arithmetic operation that often destroy the use of the fancy
SIMD instructions and good performance. And, even with $OVERFLOW, that would
probably be the case.
One option might be to generate 2 sets of vectorization classes: checked and
unchecked.
Writing the checked alternatives will take some care to make sure they are
fast. And, it isn't just +/-, but it is also the sum and avg aggregations, etc.
> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
> Key: HIVE-18421
> URL: https://issues.apache.org/jira/browse/HIVE-18421
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
>
> In vectorized execution arithmetic operations which cause integer overflows
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by
> diff desc;
> +-------+-----+-------+
> | t1 | t2 | diff |
> +-------+-----+-------+
> | -104 | 25 | 127 |
> | -112 | 24 | 120 |
> | 54 | 9 | 45 |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)