[
https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734784#comment-14734784
]
Aihua Xu commented on HIVE-11742:
---------------------------------
I didn't know that last_value behaves like that for {{last_value(x) over
(partition by y order by z)}}. Seems it will have different behaviors from
windowing function {{last_value(x) over (partition by y order by z rows between
x preceding and y following)}} and may confuse people. I think it makes sense
to make the change.
Of course it 's non-compatible change. We may have to add a configuration to
allow enabling the new behavior.
> last_value window specifier enforces ordering as a partition
> ------------------------------------------------------------
>
> Key: HIVE-11742
> URL: https://issues.apache.org/jira/browse/HIVE-11742
> Project: Hive
> Issue Type: Bug
> Components: PTF-Windowing
> Reporter: Prateek Rungta
>
> [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the
> partitioning behavior of the last_value function. For a specified
> last_value() OVER X. The ordering spec within X is used in addition to the
> partition spec for partitioning. i.e. last_value(a) OVER (PARTITION BY i
> ORDER BY j) operates last_value(a) on all rows within the unique combination
> of (i,j). The behavior I'd expect is for PARTITION BY i to define the
> partitioning, and ORDER BY to define the ordering within the PARTITION. i.e.
> last_value(a) OVER (PARTITION BY i ORDER BY j) should operate last_value(a)
> on all rows within the unique values of (i), ordered by j within the
> partition.
> This was changed to be consistent with how SQLServer handled such queries.
> [SQLServer
> Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396]
> describe their example (which performs as Hive does):
> {quote}
> The PARTITION BY clause partitions the employees by department and the
> LAST_VALUE function is applied to each partition independently. The ORDER BY
> clause specified in the OVER clause determines the logical order in which the
> LAST_VALUE function is applied to the rows in each partition.
> {quote}
> To me, their behavior is inconsistent with their description. I've filled an
> [upstream
> bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with
> Microsoft for the same.
> [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
> and
> [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
> both exhibit the behavior I'd expect.
> Considering Hive-4262 has been in core for 2+ years, I don't think we can
> change the behavior without potentially impacting clients. But I would like a
> way to enable the expected behavior at the least (behind a config flag
> maybe?). What do you think?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)