[jira] [Commented] (HIVE-12228) Hive 0.13.1 Error for nested query with UDF returns Struct type

Wenlei Xie (JIRA) Wed, 28 Oct 2015 13:38:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979178#comment-14979178
 ]


Wenlei Xie commented on HIVE-12228:
-----------------------------------

This bug consists of two parts. 

1. The query cannot be executed without the predicate pushdown because the 
column descriptor associated can be changed in an unexpected way during Hive 
query transformation under some certain conditions.  This was fixed in 
HIVE-7027. The essence of the patch is an [one line 
fix|https://github.com/apache/hive/commit/cf2ad57d21a77ad4f6f2deb72a576b90275d6055#diff-e0b3d4ba0783fc2f31724d01aa6f65a7]
  fix that makes sure the column description are copied.

2. Hive 0.13.1 didn't perform the predicate pushdown for {{UDF().xx}} because 
of the following lines in 
[oah.hive.ql.ppdOpProcFactory.createFilter|https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java#L706]:
{code}
    ExprNodeDesc condn = ExprNodeDescUtils.mergePredicates(preds);
    if(!(condn instanceof ExprNodeGenericFuncDesc)) {
      return null;
    }
{code}

This was fixed as a by-product in HIVE-7826, see the [changes to OpProcFactory 
class 
|https://github.com/apache/hive/commit/98992da55cbe7b2647bad7bb69f9587c320c805a#diff-3e9f4827f7991810570353623f74f3d9]



> Hive 0.13.1 Error for nested query with UDF returns Struct type
> ---------------------------------------------------------------
>
>                 Key: HIVE-12228
>                 URL: https://issues.apache.org/jira/browse/HIVE-12228
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Query Planning, UDF
>    Affects Versions: 0.13.1
>            Reporter: Wenlei Xie
>         Attachments: SimpleStruct.java
>
>
> The following simple nested query with UDF returns Struct would fail on Hive 
> 0.13.1 . The UDF java code is attached as {{SimpleStruct.java}}
> {noformat}
> ADD JAR simplestruct.jar;
> CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';
> SELECT *
>   FROM (
>     SELECT *
>     from mytest
>  ) subquery
> WHERE simplestruct(subquery.testStr).first
> {noformat}
> The error message is 
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>         ... 8 more
> Caused by: java.lang.RuntimeException: cannot find field teststr from 
> [0:_col0, 1:_col1, 2:_col2]
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
> ..............................
> {noformat}
> The query works fine if we replace the UDF returns Boolean. By comparing the 
> query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
> {noformat}
>           TableScan
>             Select Operator
>               Filter Operator
>                 Select Operator
> {noformat}
> The first Select Operator would rename the columns to {{col_k}}, which cause 
> this trouble. If we use some UDF returns Boolean, the query plan becomes 
> {noformat}
>           TableScan
>             Filter Operator
>               Select Operator
> {noformat}
> It looks like the Query Planner failed to push down the Filter Operator when 
> the predicate is based on a UDF returns Struct. 
> This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.
> Appendix: 
> The table {{mytest}} is created in the following way
> {noformat}
> CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
> {noformat}
> The file {{test.txt}} is a simple CSV file.
> {noformat}
> 1,haha,hehe
> 2,my,test
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12228) Hive 0.13.1 Error for nested query with UDF returns Struct type

Reply via email to