Deepak Jaiswal commented on HIVE-18622:

I had a brief chat with Matt about this. Here is what he explained to me the 
problem is, hope it helps [~sershe]


 The problem is with ColumnVector reuse.
 For intermediate calculation we grab a (statically) allocated ColumnVector, 
generated output into it, pass it on to another vector expression and then that 
ColumnVector is implicitly returned to be available for another vector 

The pattern people were using was outputColVector.noNulls = 
The problem is the ColumnVector.reset() method *assumes* that if noNulls is 
true that all isNull entries are false.  And, a huge amount of code was 
assuming they same thing.  That it did not have to set isNull entries if 
noNulls is true.  The crux of the issue though is the outputColVector.noNulls 
flag is basically corrupted if you set it from the inputColVector.
So if vector expression #1 sets one row as NULL by doing 
outputColVector.isNull[batchIndex] = true and outputColVector.noNulls = false 
that works for the current expression.  But if the next vector expression #2 
reuses outputColVector and sets noNulls to true we have a isNull array with a 
true lurking in it.  The output of #2 for that row will appear to other code as 
NULL which is wrong.

> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> ---------------------------------------------------------------------------------
>                 Key: HIVE-18622
>                 URL: https://issues.apache.org/jira/browse/HIVE-18622
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>         Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch, 
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch, 
> HIVE-18622.095.patch, HIVE-18622.096.patch
>  Many vector expression classes are setting noNulls to true which does not 
> work if the VRB is a scratch column being reused. The previous use may have 
> set noNulls to false and the isNull array will have some rows marked as NULL. 
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
>       // Carefully handle NULLs...
>       /*
>        * For better performance on LONG/DOUBLE we don't want the conditional
>        * statements inside the for loop.
>        */
>       outputColVector.noNulls = false;
>  {code}
> And, vector expressions need to make sure the isNull array entry is set when 
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the 
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.

This message was sent by Atlassian JIRA

Reply via email to