[
https://issues.apache.org/jira/browse/HIVE-23541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal Vijayaraghavan updated HIVE-23541:
----------------------------------------
Affects Version/s: 4.0.0
3.1.2
> Vectorization: Unbounded following window function start producing results
> too early
> ------------------------------------------------------------------------------------
>
> Key: HIVE-23541
> URL: https://issues.apache.org/jira/browse/HIVE-23541
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0, 3.1.2
> Reporter: Gopal Vijayaraghavan
> Priority: Major
>
> ReduceRecordSource indicates the end of group for a reducer input, whenever
> the entire key changes.
> ReduceRecordSource::processVectorGroup calls
> reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ true); when the
> last group is being processed.
> However for PTF window functions with unbounded following, this is triggered
> by the key changing and not the partition changing.
> This results in the VectorPTFOperator detect a change in the sort key as a
> switch of the partition key and start producing results too early.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFOperator.java#L399
> {code}
> create temporary table test2(id STRING,name STRING,event_dt date) stored as
> orc;
> insert into test2 values ('100','A','2019-08-15'), ('100','A','2019-10-12');
> SELECT name, event_dt, first_value(event_dt) over (PARTITION BY name ORDER BY
> event_dt desc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW) last_event_dt
> FROM test2; -- streaming FIRST_VALUE with DESCENDING
> SELECT name, event_dt, last_value(event_dt) over (PARTITION BY name ORDER BY
> event_dt asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )
> last_event_dt FROM test2; -- non-streaming LAST_VALUE with ASCENDING
> {code}
> These two queries should return identical results, with the streaming version
> being significantly faster than the non-streaming one, due to the lack of
> buffered/spilled rows with streaming.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)