[
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363446#comment-16363446
]
Sergey Shelukhin commented on HIVE-18622:
-----------------------------------------
I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru),
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no
unified semantic approach to noNulls, and to the pattern of setting and
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why
certain places in the code that fill the meaningful parts of the batch with
non-nulls still don't set noNulls (I commented on one or two, there are many).
Seems like noNulls should be set every time there are noNulls, and isNull array
should be set correctly by whoever sets noNulls to false.
So, the approach could be:
1) set isNull to false every time we set a non-null value and noNulls is not
true.
2) when flipping noNulls to false, make sure that isNull is correct; when
flipping it as part of larger loop that always sets isNull unconditionally
(e.g. in TreeReader::nextVector in ORC) it's not necessary; when looping thru a
bunch of non-nulls and finding a null it may be necessary to backfill the
false-s in the preceding values and rely on (1) to fill the following values
once noNulls is flipped; when setting individual elements without context it
may be necessary to fill the array entirely (done only once when actually
flipping noNulls).
Right now it seems like some places are too conservative and fill isNull even
when noNulls is true; and some actually remove isNull-setting when setting
values to non-nulls, without checking for noNulls (so, presumably if noNulls is
true isNull could be incorrect and it's not clear the next set that happens to
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be
because the patch is so large; perhaps it should be described somewhere in one
place.
> Vectorization: IF Statements, Comparisons, and more do not handle NULLs
> correctly
> ---------------------------------------------------------------------------------
>
> Key: HIVE-18622
> URL: https://issues.apache.org/jira/browse/HIVE-18622
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch,
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch,
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch,
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch,
> HIVE-18622.095.patch, HIVE-18622.096.patch
>
>
>
> Many vector expression classes are setting noNulls to true which does not
> work if the VRB is a scratch column being reused. The previous use may have
> set noNulls to false and the isNull array will have some rows marked as NULL.
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
> // Carefully handle NULLs...
> /*
> * For better performance on LONG/DOUBLE we don't want the conditional
> * statements inside the for loop.
> */
> outputColVector.noNulls = false;
> {code}
> And, vector expressions need to make sure the isNull array entry is set when
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)