[ 
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363446#comment-16363446
 ] 

Sergey Shelukhin edited comment on HIVE-18622 at 2/14/18 3:45 AM:
------------------------------------------------------------------

I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru), 
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no 
unified semantic approach to noNulls, and to the pattern of setting and 
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why 
certain places in the code that fill the meaningful parts of the batch with 
non-nulls still don't set noNulls (I commented on one or two, there are many). 
Seems like noNulls should be set every time there are no nulls, and isNull 
array should be set correctly by whoever sets noNulls to false.
So, an example uniform approach could be:
0) The only parts of the batch that matter are those included in batch size.
1) set isNull to false every time we set a non-null value and noNulls is not 
true.
2) when flipping noNulls to false, make sure that isNull is correct; when 
flipping it as part of larger loop that always sets isNull unconditionally 
(e.g. in TreeReader::nextVector in ORC) no additional action necessary; when 
looping thru a bunch of non-nulls and finding a null it may be necessary to 
backfill the false-s in the preceding values and rely on (1) to fill the 
following values once noNulls is flipped; when setting individual elements 
without context it may be necessary to fill the array entirely (done only once 
when actually flipping noNulls).

Right now it seems like some places are too conservative and fill isNull even 
when noNulls is true; and some actually remove isNull-setting when setting 
values to non-nulls, without checking for noNulls (so, presumably if noNulls is 
true isNull could be incorrect and it's not clear the next set that happens to 
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be 
because the patch is so large; perhaps it should be described somewhere in one 
place.




was (Author: sershe):
I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru), 
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no 
unified semantic approach to noNulls, and to the pattern of setting and 
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why 
certain places in the code that fill the meaningful parts of the batch with 
non-nulls still don't set noNulls (I commented on one or two, there are many). 
Seems like noNulls should be set every time there are noNulls, and isNull array 
should be set correctly by whoever sets noNulls to false.
So, the approach could be:
1) set isNull to false every time we set a non-null value and noNulls is not 
true.
2) when flipping noNulls to false, make sure that isNull is correct; when 
flipping it as part of larger loop that always sets isNull unconditionally 
(e.g. in TreeReader::nextVector in ORC) it's not necessary; when looping thru a 
bunch of non-nulls and finding a null it may be necessary to backfill the 
false-s in the preceding values and rely on (1) to fill the following values 
once noNulls is flipped; when setting individual elements without context it 
may be necessary to fill the array entirely (done only once when actually 
flipping noNulls).

Right now it seems like some places are too conservative and fill isNull even 
when noNulls is true; and some actually remove isNull-setting when setting 
values to non-nulls, without checking for noNulls (so, presumably if noNulls is 
true isNull could be incorrect and it's not clear the next set that happens to 
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be 
because the patch is so large; perhaps it should be described somewhere in one 
place.



> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-18622
>                 URL: https://issues.apache.org/jira/browse/HIVE-18622
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch, 
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch, 
> HIVE-18622.095.patch, HIVE-18622.096.patch
>
>
>  
>  Many vector expression classes are setting noNulls to true which does not 
> work if the VRB is a scratch column being reused. The previous use may have 
> set noNulls to false and the isNull array will have some rows marked as NULL. 
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
>       // Carefully handle NULLs...
>       /*
>        * For better performance on LONG/DOUBLE we don't want the conditional
>        * statements inside the for loop.
>        */
>       outputColVector.noNulls = false;
>  {code}
> And, vector expressions need to make sure the isNull array entry is set when 
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the 
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to