[
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979178#comment-14979178
]
Wenlei Xie commented on HIVE-12228:
-----------------------------------
This bug consists of two parts.
1. The query cannot be executed without the predicate pushdown because the
column descriptor associated can be changed in an unexpected way during Hive
query transformation under some certain conditions. This was fixed in
HIVE-7027. The essence of the patch is an [one line
fix|https://github.com/apache/hive/commit/cf2ad57d21a77ad4f6f2deb72a576b90275d6055#diff-e0b3d4ba0783fc2f31724d01aa6f65a7]
fix that makes sure the column description are copied.
2. Hive 0.13.1 didn't perform the predicate pushdown for {{UDF().xx}} because
of the following lines in
[oah.hive.ql.ppdOpProcFactory.createFilter|https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java#L706]:
{code}
ExprNodeDesc condn = ExprNodeDescUtils.mergePredicates(preds);
if(!(condn instanceof ExprNodeGenericFuncDesc)) {
return null;
}
{code}
This was fixed as a by-product in HIVE-7826, see the [changes to OpProcFactory
class
|https://github.com/apache/hive/commit/98992da55cbe7b2647bad7bb69f9587c320c805a#diff-3e9f4827f7991810570353623f74f3d9]
> Hive 0.13.1 Error for nested query with UDF returns Struct type
> ---------------------------------------------------------------
>
> Key: HIVE-12228
> URL: https://issues.apache.org/jira/browse/HIVE-12228
> Project: Hive
> Issue Type: Bug
> Components: Hive, Query Planning, UDF
> Affects Versions: 0.13.1
> Reporter: Wenlei Xie
> Attachments: SimpleStruct.java
>
>
> The following simple nested query with UDF returns Struct would fail on Hive
> 0.13.1 . The UDF java code is attached as {{SimpleStruct.java}}
> {noformat}
> ADD JAR simplestruct.jar;
> CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';
> SELECT *
> FROM (
> SELECT *
> from mytest
> ) subquery
> WHERE simplestruct(subquery.testStr).first
> {noformat}
> The error message is
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> ... 8 more
> Caused by: java.lang.RuntimeException: cannot find field teststr from
> [0:_col0, 1:_col1, 2:_col2]
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
> ..............................
> {noformat}
> The query works fine if we replace the UDF returns Boolean. By comparing the
> query plan, we note when using the {{SimpleStruct}} UDF, the query plan is
> {noformat}
> TableScan
> Select Operator
> Filter Operator
> Select Operator
> {noformat}
> The first Select Operator would rename the columns to {{col_k}}, which cause
> this trouble. If we use some UDF returns Boolean, the query plan becomes
> {noformat}
> TableScan
> Filter Operator
> Select Operator
> {noformat}
> It looks like the Query Planner failed to push down the Filter Operator when
> the predicate is based on a UDF returns Struct.
> This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.
> Appendix:
> The table {{mytest}} is created in the following way
> {noformat}
> CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
> {noformat}
> The file {{test.txt}} is a simple CSV file.
> {noformat}
> 1,haha,hehe
> 2,my,test
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)