[
https://issues.apache.org/jira/browse/HIVE-19493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471354#comment-16471354
]
Vihang Karajgaonkar commented on HIVE-19493:
--------------------------------------------
Thanks [~mmccline] for the review. Updated the diff with the suggested change.
Actually I am having hard time to understand the complete logic of when noNulls
needs to be set to {{true}}. For example,
in the below code snippet from {{copySelected}} method
{code:java}
// Handle repeating case
if (input.isRepeating) {
if (input.noNulls || !input.isNull[0]) {
String string = new String(input.vector[0], input.start[0],
input.length[0]);
try {
date.setTime(formatter.parse(string).getTime());
output.vector[0] = DateWritable.dateToDays(date);
output.isNull[0] = false;
} catch (ParseException e) {
output.isNull[0] = true;
output.noNulls = false;
}
} else {
output.isNull[0] = true;
output.noNulls = false;
}
output.isRepeating = true;
return;
}
{code}
Can you please help me understand the following?
* Why do we *not* set output.notNulls = true in the {{if (input.noNulls ||
!input.isNull[0])}} code block when we know that {{input.isRepeating == true}}?
Shouldn't we be setting it to true and reset output.isNull[] array to all false
values? My guess is that we are not doing this for performance reasons since it
doesn't make sense to reset the whole {{output.isNull[]}} when we know that we
only need to look for the first element. Just want to confirm if this
understanding is correct (may be even add a comment so that its easy to
remember next time)
* As a corollary of the above statement it means both the following conditions
are valid but they represent the same state of the columnVector.
{{vector.isRepeating == true && vector.noNulls == true}} --> vector has a
non-null repeating value
{{vector.isRepeating == true && vector.noNulls == false && vector.isNull[0] ==
false}} --> vector has non-null repeating value. Is this understanding correct?
* In case of {{input.isRepeating == false && selected == true}} case when
{{input.noNulls == true && output.noNull == false}} why don't we set
{{output.noNull = true}}. Is it because we think that when {{selected == true}}
there may be less number of rows to be updated and hence its unnecessary work
to reset {{output.isNull[]}} which we have to do everytime when set
{{output.noNulls = true}}?
I think the bug which was fixed in HIVE-18622 manifests when the
{{output.noNulls}} is flipped from {{false to true}} and when the {{isNull}}
array has some entries which are {{true}}. So may be we create an expression
such that the columnVector is reused and this flag gets flipped. Just adding a
row of nulls may not exercise the code which has the bug. Let me see if I can
create an expression to test this particular issue.
> VectorUDFDateDiffColCol copySelected does not handle nulls correctly
> --------------------------------------------------------------------
>
> Key: HIVE-19493
> URL: https://issues.apache.org/jira/browse/HIVE-19493
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Major
> Attachments: HIVE-19493.01.patch, HIVE-19493.02.patch
>
>
> The {{copySelected}} method in {{VectorUDFDateDiffColCol}} class was missed
> during HIVE-18622
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)