[
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357722#comment-16357722
]
Gopal V commented on HIVE-18421:
--------------------------------
The performance question was primarily about inserting the check in the inner
loop.
What you have done with the Util doing a second traversal is likely to be an
overhead only where the checked impl is needed for correctness.
If you add a small test to VectorizedBench in hive-jmh, I can go over the
assembly and make sure we're JIT'ing the bigint operators exactly the same way
and also check the L1 cache rate (the 2nd iteration in the overflow util should
get a 100% hit-rate, hopefully).
> Vectorized execution handles overflows in a different manner than
> non-vectorized execution
> ------------------------------------------------------------------------------------------
>
> Key: HIVE-18421
> URL: https://issues.apache.org/jira/browse/HIVE-18421
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Major
> Attachments: HIVE-18421.01.patch, HIVE-18421.02.patch,
> HIVE-18421.03.patch, HIVE-18421.04.patch, HIVE-18421.05.patch,
> HIVE-18421.06.patch, HIVE-18421.07.patch
>
>
> In vectorized execution arithmetic operations which cause integer overflows
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by
> diff desc;
> +-------+-----+-------+
> | t1 | t2 | diff |
> +-------+-----+-------+
> | -104 | 25 | 127 |
> | -112 | 24 | 120 |
> | 54 | 9 | 45 |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)