[
https://issues.apache.org/jira/browse/HIVE-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-24804:
--------------------------------
Fix Version/s: 4.0.0
> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one
> ORDER BY column
> --------------------------------------------------------------------------------------------
>
> Key: HIVE-24804
> URL: https://issues.apache.org/jira/browse/HIVE-24804
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string,
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row)
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row)
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row)
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row)
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the
> current row, based on the current row’s ORDER BY key value. The provided
> range value is added or subtracted to the current row's key value to define a
> starting or ending range boundary for the window frame. In a range-based
> window frame, there must be exactly one expression in the ORDER BY clause,
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because,
> without ORDER BY, all partition rows are peers). The default is equivalent to
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range,
> otherwise the sql statement reflects a different thing from which is returned
> by the engine
--
This message was sent by Atlassian Jira
(v8.3.4#803005)