[
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333621#comment-16333621
]
Vihang Karajgaonkar commented on HIVE-18421:
--------------------------------------------
I think one easy way to solve this is to cast the values in the long vector to
the outputType. Based on my testing it works (atleast for the arithmetic
expressions for which I tested). Currently, I am compiling a list of
expressions which are affected by this issue. Any thoughts on the cast
operator? I can do some benchmarks if there are automated tests in the source
code and verify how much it affects the performance. Here is the snippet code
which if I add solves the problem. For example, if the outputType is {{int}}
you add the following code after the expression is evaluated on the
LongColumnVector.
{code}
+ //int
+ if (v.isRepeating) {
+ v.vector[0] = (int) v.vector[0];
+ } else if (selectedInUse) {
+ for (int j = 0; j != n; j++) {
+ int i = sel[j];
+ v.vector[i] = (int) v.vector[i];
+ }
+ } else {
+ for (int i = 0; i != n; i++) {
+ v.vector[i] = (int) v.vector[i];
+ }
+ }
{code}
I think the good news here is AFAIK only the supportedGenericUDFs in the
vectorizer will be affected because the rest will use {{VectorUDFAdapter}}
which should not be affected by this issue. That reduces the scope of the
problem than earlier thought. However, we will have to be careful while adding
to new UDFs to the supported list.
[~gopalv] [~mmccline] any thoughts on the down casting the values in the Column
vector like the snippet is doing above?
> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
> Key: HIVE-18421
> URL: https://issues.apache.org/jira/browse/HIVE-18421
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Major
>
> In vectorized execution arithmetic operations which cause integer overflows
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by
> diff desc;
> +-------+-----+-------+
> | t1 | t2 | diff |
> +-------+-----+-------+
> | -104 | 25 | 127 |
> | -112 | 24 | 120 |
> | 54 | 9 | 45 |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)