[ 
https://issues.apache.org/jira/browse/HIVE-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419289#comment-17419289
 ] 

Karen Coppage commented on HIVE-25549:
--------------------------------------

Could also result in an NPE:
{code:java}
create table row_number_test as select (posexplode(split(repeat("w,", 2), 
","))) as (pos, col);
 select row_number() over(partition by cast (pos as string)) from 
row_number_test;{code}

results in

{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing vector batch (tag=0) (vectorizedVertexNum 1)
...
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.MathExpr.writeLongToUTF8(MathExpr.java:136)
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.CastLongToString.func(CastLongToString.java:53)
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.LongToStringUnaryUDF.evaluate(LongToStringUnaryUDF.java:111)
  at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:381)
{code}

because the orderExpressions don't get a transient init either.
(Now I don't quite understand why there are orderExpressions when the function 
has no order by clause, but that's beside the point.)

> Need to transient init partition expressions in vectorized PTFs
> ---------------------------------------------------------------
>
>                 Key: HIVE-25549
>                 URL: https://issues.apache.org/jira/browse/HIVE-25549
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Sometimes the partition in a vectorized PTF needs some sort of 
> transformation. For these to work the partition expression may need some 
> transient variables initialized. 
> Example with row_number:
> {code:java}
> create table test_rownumber (a string, b string) stored as orc;
> insert into test_rownumber values
> ('1', 'a'),
> ('2', 'b'),
> ('3', 'c'),
> ('4', 'd'),
> ('5', 'e');
> CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS 
> `a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`,
> `test_rownumber`.`b` as `b` from `default`.`test_rownumber`;
> set hive.vectorized.execution.enabled=true;
> select *, row_number() over(partition by a_int order by b) from 
> test_rownumber_vue;
> {code}
> Output is:
> {code:java}
> +-----------------------+---------------------------+-----------------------+----------------------+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +-----------------------+---------------------------+-----------------------+----------------------+
> | 1                     | 1                         | a                     | 
> 1                    |
> | 2                     | 2                         | b                     | 
> 2                    |
> | 3                     | 3                         | c                     | 
> 3                    |
> | 4                     | 4                         | d                     | 
> 4                    |
> | 5                     | 5                         | e                     | 
> 5                    |
> +-----------------------+---------------------------+-----------------------+----------------------+
> {code}
> But it should be this, because we restart the row numbering for each 
> partition:
> {code:java}
> +-----------------------+---------------------------+-----------------------+----------------------+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +-----------------------+---------------------------+-----------------------+----------------------+
> | 1                     | 1                         | a                     | 
> 1                    |
> | 2                     | 2                         | b                     | 
> 1                    |
> | 3                     | 3                         | c                     | 
> 1                    |
> | 4                     | 4                         | d                     | 
> 1                    |
> | 5                     | 5                         | e                     | 
> 1                    |
> +-----------------------+---------------------------+-----------------------+----------------------+
> {code}
> Explanation:
> CastStringToLong has to be executed on the partition column (a_int). Because 
> CastStringToLong.integerPrimitiveCategory is not initialized, all output of 
> CastStringToLong is null - so a_int is interpreted as containing null values 
> only and partitioning is ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to