[ 
https://issues.apache.org/jira/browse/HIVE-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374456#comment-16374456
 ] 

Matt McCline commented on HIVE-18524:
-------------------------------------

# Before we add any more vectorization features like this one I'd like to see a 
better testing framework.  HIVE-18622 was a huge fix to many vector expressions 
that were handling NULLs incorrectly.  In particular, IfExprColumnNull and 
IfExprNullColumn had bugs.  By better framework, I like to see all the 
different UDFs generated with expressions for both row- and vector- and have 
data will NULLs generated and driven through those expressions and the results 
compared.  I think a framework like this would have found the bug that resulted 
in the revert.  And, also would have found many of the problems fixed by 
HIVE-18622.
 # Vector expressions need to be created in the VectorizationContext class and 
not as a special case in VectorUDFAdaptor.  VectorizationContext instantiates 
vector expressions so they have proper TypeInfo and can be displayed with 
EXPLAIN VECTORIZATION.
 # What I don't understand is why an IF expr with computed THEN and/or ELSE 
values isn't just another vector expression.  I may be missing something.  I 
certainly see it is more sophisticated in that you want to avoid executing any 
THEN expressions whose IF expr row value is false and similarly avoid executing 
any ELSE expressions whose IF expr is true.
 # Usually, rather than add special cases to existing vector expressions we use 
GenVectorCode to generate with templates new class variations.  We try to avoid 
complicated base classes that have lots of decision logic.  That was my 
impression when I read the code a while ago.  I could be wrong but even though 
it may seem redudant we generally have vectorization code variation just focus 
on the specific variation they are executing.

> Vectorization: Execution failure related to non-standard embedding of 
> IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18524
>                 URL: https://issues.apache.org/jira/browse/HIVE-18524
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18524.01.patch, HIVE-18524.02.patch
>
>
> {noformat}
> insert overwrite table insert_10_1
>     select cast(gpa as float),
>            age,
>            IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
>            IF(LENGTH(name)>10,cast(name as binary),NULL)
>     from studentnull10k
> vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]
> ExprNodeDescs:
>     UDFToFloat(gpa) (type: float),
>     age (type: int),
>     if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
>     if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)
> selectExpressions:
>     VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
>         (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) 
> -> 5:timestamp,
>     VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
>         (children: LongColGreaterLongScalar(col 4:int, val 10)(children: 
> StringLength(col 0:string) -> 4:int) -> 6:boolean,
>         VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
> {noformat}
> *// Notice there is no vector expression shown for the last IF stmt.*  It has 
> been magically embedded inside the VectorUDFAdaptor object...
> Execution results in this call stack.
> {nocode}
> Caused by: java.lang.NullPointerException
>       at java.util.Arrays.copyOfRange(Arrays.java:3521)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>       ... 22 more
> {nocode}
> Change is due to:
> HIVE-17139: Conditional expressions optimization: skip the expression 
> evaluation if the condition is not satisfied for vectorization engine. (Jia 
> Ke, reviewed by Ferdinand Xu)
> Embedding a raw vector expression outside of VectorizationContext is quite 
> non-standard and evidently buggy.
> [~Ferd] [~Ke Jia] I am inclined to revert this change.  Comments?  CC: 
> [~ashutoshc] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to