Ke Jia commented on HIVE-18524:


{quote}Before we add any more vectorization features like this one I'd like to 
see a better testing framework.  HIVE-18622 was a huge fix to many vector 
expressions that were handling NULLs incorrectly.  In particular, 
IfExprColumnNull and IfExprNullColumn had bugs.  By better framework, I like to 
see all the different UDFs generated with expressions for both row- and vector- 
and have data will NULLs generated and driven through those expressions and the 
results compared.  I think a framework like this would have found the bug that 
resulted in the revert.  And, also would have found many of the problems fixed 
by HIVE-18622.

{quote}Now, we have passed all the origin qtest and the new added qtest I can 
think of . If you have comprehensive test cases, Please tell me, thanks.

{quote}Vector expressions need to be created in the VectorizationContext class 
and not as a special case in VectorUDFAdaptor.  VectorizationContext 
instantiates vector expressions so they have proper TypeInfo and can be 

{quote} I will try to move the creation of Vector expression code in 
VectorUDFAdaptor to VectorizationContext .

{quote}What I don't understand is why an IF expr with computed THEN and/or ELSE 
values isn't just another vector expression.  I may be missing something.  I 
certainly see it is more sophisticated in that you want to avoid executing any 
THEN expressions whose IF expr row value is false and similarly avoid executing 
any ELSE expressions whose IF expr is true.

{quote}Case When Else expression is translated to If expression in HIVE-16731. 
If I have wrong understanding your question, Please tell me, thanks.

{quote}Usually, rather than add special cases to existing vector expressions we 
use GenVectorCode to generate with templates new class variations.  We try to 
avoid complicated base classes that have lots of decision logic.  That was my 
impression when I read the code a while ago.  I could be wrong but even though 
it may seem redudant we generally have vectorization code variation just focus 
on the specific variation they are executing.

{quote}The special case only focus on the row- expression 
(VectorUDFAdaptor.java) to solve the GenericUDFIf case and for the vector- 
expression, I think it can apply to all the generally case. If I have wrong 
understanding, Please tell me.

> Vectorization: Execution failure related to non-standard embedding of 
> IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139)
> -----------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-18524
>                 URL: https://issues.apache.org/jira/browse/HIVE-18524
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>         Attachments: HIVE-18524.01.patch, HIVE-18524.02.patch
> {noformat}
> insert overwrite table insert_10_1
>     select cast(gpa as float),
>            age,
>            IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
>            IF(LENGTH(name)>10,cast(name as binary),NULL)
>     from studentnull10k
> vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]
> ExprNodeDescs:
>     UDFToFloat(gpa) (type: float),
>     age (type: int),
>     if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
>     if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)
> selectExpressions:
>     VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
>         (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) 
> -> 5:timestamp,
>     VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
>         (children: LongColGreaterLongScalar(col 4:int, val 10)(children: 
> StringLength(col 0:string) -> 4:int) -> 6:boolean,
>         VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
> {noformat}
> *// Notice there is no vector expression shown for the last IF stmt.*  It has 
> been magically embedded inside the VectorUDFAdaptor object...
> Execution results in this call stack.
> {nocode}
> Caused by: java.lang.NullPointerException
>       at java.util.Arrays.copyOfRange(Arrays.java:3521)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>       ... 22 more
> {nocode}
> Change is due to:
> HIVE-17139: Conditional expressions optimization: skip the expression 
> evaluation if the condition is not satisfied for vectorization engine. (Jia 
> Ke, reviewed by Ferdinand Xu)
> Embedding a raw vector expression outside of VectorizationContext is quite 
> non-standard and evidently buggy.
> [~Ferd] [~Ke Jia] I am inclined to revert this change.  Comments?  CC: 
> [~ashutoshc] [~hagleitn]

This message was sent by Atlassian JIRA

Reply via email to