[
https://issues.apache.org/jira/browse/HIVE-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karen Coppage updated HIVE-25549:
---------------------------------
Description:
{code:java}
set hive.vectorized.execution.enabled=true;
create table test_rownumber (a string, b string) stored as orc;
insert into test_rownumber values
('1', 'a'),
('2', 'b'),
('3', 'c'),
('4', 'd'),
('5', 'e');
CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS
`a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`,
`test_rownumber`.`b` as `b` from `default`.`test_rownumber`;
set hive.vectorized.execution.enabled=true;
select *, row_number() over(partition by a_int order by b) from
test_rownumber_vue;
{code}
Output is:
{code:java}
+-----------------------+---------------------------+-----------------------+----------------------+
| test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b |
row_number_window_0 |
+-----------------------+---------------------------+-----------------------+----------------------+
| 1 | 1 | a | 1
|
| 2 | 2 | b | 2
|
| 3 | 3 | c | 3
|
| 4 | 4 | d | 4
|
| 5 | 5 | e | 5
|
+-----------------------+---------------------------+-----------------------+----------------------+{code}
But it should be this, because we should restart row numbering for each
partition:
{code:java}
+-----------------------+---------------------------+-----------------------+----------------------+
| test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b |
row_number_window_0 |
+-----------------------+---------------------------+-----------------------+----------------------+
| 1 | 1 | a | 1
|
| 2 | 2 | b | 1
|
| 3 | 3 | c | 1
|
| 4 | 4 | d | 1
|
| 5 | 5 | e | 1
|
+-----------------------+---------------------------+-----------------------+----------------------+
{code}
> Need to transient init partition expressions in vectorized PTFs
> ---------------------------------------------------------------
>
> Key: HIVE-25549
> URL: https://issues.apache.org/jira/browse/HIVE-25549
> Project: Hive
> Issue Type: Bug
> Reporter: Karen Coppage
> Assignee: Karen Coppage
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Sometimes the partition in a vectorized PTF needs some sort of transformation
> (in the example below: CastStringToLong). For these to work the partition
> expression may need some transient varia
> Example with row_number:
> {code:java}
> create table test_rownumber (a string, b string) stored as orc;
> insert into test_rownumber values
> ('1', 'a'),
> ('2', 'b'),
> ('3', 'c'),
> ('4', 'd'),
> ('5', 'e');
> CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS
> `a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`,
> `test_rownumber`.`b` as `b` from `default`.`test_rownumber`;
> set hive.vectorized.execution.enabled=true;
> select *, row_number() over(partition by a_int order by b) from
> test_rownumber_vue;
> {code}
> Output is:
> {code:java}
> +-----------------------+---------------------------+-----------------------+----------------------+
> | test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b |
> row_number_window_0 |
> +-----------------------+---------------------------+-----------------------+----------------------+
> | 1 | 1 | a |
> 1 |
> | 2 | 2 | b |
> 2 |
> | 3 | 3 | c |
> 3 |
> | 4 | 4 | d |
> 4 |
> | 5 | 5 | e |
> 5 |
> +-----------------------+---------------------------+-----------------------+----------------------+
> {code}
> But it should be this, because we restart the row numbering for each
> partition:
> {code:java}
> +-----------------------+---------------------------+-----------------------+----------------------+
> | test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b |
> row_number_window_0 |
> +-----------------------+---------------------------+-----------------------+----------------------+
> | 1 | 1 | a |
> 1 |
> | 2 | 2 | b |
> 1 |
> | 3 | 3 | c |
> 1 |
> | 4 | 4 | d |
> 1 |
> | 5 | 5 | e |
> 1 |
> +-----------------------+---------------------------+-----------------------+----------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)