[
https://issues.apache.org/jira/browse/HIVE-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375323#comment-16375323
]
Ke Jia commented on HIVE-18524:
-------------------------------
[~mmccline]:
{quote}Before we add any more vectorization features like this one I'd like to
see a better testing framework. HIVE-18622 was a huge fix to many vector
expressions that were handling NULLs incorrectly. In particular,
IfExprColumnNull and IfExprNullColumn had bugs. By better framework, I like to
see all the different UDFs generated with expressions for both row- and vector-
and have data will NULLs generated and driven through those expressions and the
results compared. I think a framework like this would have found the bug that
resulted in the revert. And, also would have found many of the problems fixed
by HIVE-18622.
{quote}Now, we have passed all the origin qtest and the new added qtest I can
think of . If you have comprehensive test cases, Please tell me, thanks.
{quote}Vector expressions need to be created in the VectorizationContext class
and not as a special case in VectorUDFAdaptor. VectorizationContext
instantiates vector expressions so they have proper TypeInfo and can be
displayed with EXPLAIN VECTORIZATION.
{quote} I will try to move the creation of Vector expression code in
VectorUDFAdaptor to VectorizationContext .
{quote}What I don't understand is why an IF expr with computed THEN and/or ELSE
values isn't just another vector expression. I may be missing something. I
certainly see it is more sophisticated in that you want to avoid executing any
THEN expressions whose IF expr row value is false and similarly avoid executing
any ELSE expressions whose IF expr is true.
{quote}Case When Else expression is translated to If expression in HIVE-16731.
If I have wrong understanding your question, Please tell me, thanks.
{quote}Usually, rather than add special cases to existing vector expressions we
use GenVectorCode to generate with templates new class variations. We try to
avoid complicated base classes that have lots of decision logic. That was my
impression when I read the code a while ago. I could be wrong but even though
it may seem redudant we generally have vectorization code variation just focus
on the specific variation they are executing.
{quote}The special case only focus on the row- expression
(VectorUDFAdaptor.java) to solve the GenericUDFIf case and for the vector-
expression, I think it can apply to all the generally case. If I have wrong
understanding, Please tell me.
> Vectorization: Execution failure related to non-standard embedding of
> IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-18524
> URL: https://issues.apache.org/jira/browse/HIVE-18524
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 3.0.0
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18524.01.patch, HIVE-18524.02.patch
>
>
> {noformat}
> insert overwrite table insert_10_1
> select cast(gpa as float),
> age,
> IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
> IF(LENGTH(name)>10,cast(name as binary),NULL)
> from studentnull10k
> vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]
> ExprNodeDescs:
> UDFToFloat(gpa) (type: float),
> age (type: int),
> if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
> if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)
> selectExpressions:
> VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
> (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean)
> -> 5:timestamp,
> VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
> (children: LongColGreaterLongScalar(col 4:int, val 10)(children:
> StringLength(col 0:string) -> 4:int) -> 6:boolean,
> VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
> {noformat}
> *// Notice there is no vector expression shown for the last IF stmt.* It has
> been magically embedded inside the VectorUDFAdaptor object...
> Execution results in this call stack.
> {nocode}
> Caused by: java.lang.NullPointerException
> at java.util.Arrays.copyOfRange(Arrays.java:3521)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
> ... 22 more
> {nocode}
> Change is due to:
> HIVE-17139: Conditional expressions optimization: skip the expression
> evaluation if the condition is not satisfied for vectorization engine. (Jia
> Ke, reviewed by Ferdinand Xu)
> Embedding a raw vector expression outside of VectorizationContext is quite
> non-standard and evidently buggy.
> [~Ferd] [~Ke Jia] I am inclined to revert this change. Comments? CC:
> [~ashutoshc] [~hagleitn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)