[jira] [Commented] (HIVE-18421) Vectorized execution does not handle integer overflows

Vihang Karajgaonkar (JIRA) Sun, 21 Jan 2018 10:27:34 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333621#comment-16333621
 ]


Vihang Karajgaonkar commented on HIVE-18421:
--------------------------------------------

I think one easy way to solve this is to cast the values in the long vector to 
the outputType. Based on my testing it works (atleast for the arithmetic 
expressions for which I tested). Currently, I am compiling a list of 
expressions which are affected by this issue. Any thoughts on the cast 
operator? I can do some benchmarks if there are automated tests in the source 
code and verify how much it affects the performance. Here is the snippet code 
which if I add solves the problem. For example, if the outputType is {{int}} 
you add the following code after the expression is evaluated on the 
LongColumnVector. 

{code}
+      //int
+      if (v.isRepeating) {
+        v.vector[0] = (int) v.vector[0];
+      } else if (selectedInUse) {
+        for (int j = 0; j != n; j++) {
+          int i = sel[j];
+          v.vector[i] = (int) v.vector[i];
+        }
+      } else {
+        for (int i = 0; i != n; i++) {
+          v.vector[i] = (int) v.vector[i];
+        }
+      }
{code}

I think the good news here is AFAIK only the supportedGenericUDFs in the 
vectorizer will be affected because the rest will use {{VectorUDFAdapter}} 
which should not be affected by this issue. That reduces the scope of the 
problem than earlier thought. However, we will have to be careful while adding 
to new UDFs to the supported list.

[~gopalv] [~mmccline] any thoughts on the down casting the values in the Column 
vector like the snippet is doing above?


> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> In vectorized execution arithmetic operations which cause integer overflows 
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by 
> diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18421) Vectorized execution does not handle integer overflows

Reply via email to