[
https://issues.apache.org/jira/browse/HIVE-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374456#comment-16374456
]
Matt McCline commented on HIVE-18524:
-------------------------------------
# Before we add any more vectorization features like this one I'd like to see a
better testing framework. HIVE-18622 was a huge fix to many vector expressions
that were handling NULLs incorrectly. In particular, IfExprColumnNull and
IfExprNullColumn had bugs. By better framework, I like to see all the
different UDFs generated with expressions for both row- and vector- and have
data will NULLs generated and driven through those expressions and the results
compared. I think a framework like this would have found the bug that resulted
in the revert. And, also would have found many of the problems fixed by
HIVE-18622.
# Vector expressions need to be created in the VectorizationContext class and
not as a special case in VectorUDFAdaptor. VectorizationContext instantiates
vector expressions so they have proper TypeInfo and can be displayed with
EXPLAIN VECTORIZATION.
# What I don't understand is why an IF expr with computed THEN and/or ELSE
values isn't just another vector expression. I may be missing something. I
certainly see it is more sophisticated in that you want to avoid executing any
THEN expressions whose IF expr row value is false and similarly avoid executing
any ELSE expressions whose IF expr is true.
# Usually, rather than add special cases to existing vector expressions we use
GenVectorCode to generate with templates new class variations. We try to avoid
complicated base classes that have lots of decision logic. That was my
impression when I read the code a while ago. I could be wrong but even though
it may seem redudant we generally have vectorization code variation just focus
on the specific variation they are executing.
> Vectorization: Execution failure related to non-standard embedding of
> IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-18524
> URL: https://issues.apache.org/jira/browse/HIVE-18524
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 3.0.0
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18524.01.patch, HIVE-18524.02.patch
>
>
> {noformat}
> insert overwrite table insert_10_1
> select cast(gpa as float),
> age,
> IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
> IF(LENGTH(name)>10,cast(name as binary),NULL)
> from studentnull10k
> vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]
> ExprNodeDescs:
> UDFToFloat(gpa) (type: float),
> age (type: int),
> if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
> if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)
> selectExpressions:
> VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
> (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean)
> -> 5:timestamp,
> VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
> (children: LongColGreaterLongScalar(col 4:int, val 10)(children:
> StringLength(col 0:string) -> 4:int) -> 6:boolean,
> VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
> {noformat}
> *// Notice there is no vector expression shown for the last IF stmt.* It has
> been magically embedded inside the VectorUDFAdaptor object...
> Execution results in this call stack.
> {nocode}
> Caused by: java.lang.NullPointerException
> at java.util.Arrays.copyOfRange(Arrays.java:3521)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
> ... 22 more
> {nocode}
> Change is due to:
> HIVE-17139: Conditional expressions optimization: skip the expression
> evaluation if the condition is not satisfied for vectorization engine. (Jia
> Ke, reviewed by Ferdinand Xu)
> Embedding a raw vector expression outside of VectorizationContext is quite
> non-standard and evidently buggy.
> [~Ferd] [~Ke Jia] I am inclined to revert this change. Comments? CC:
> [~ashutoshc] [~hagleitn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)