[ https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang reassigned HIVE-11742: ---------------------------------- Assignee: Prateek Rungta > last_value window specifier enforces ordering as a partition > ------------------------------------------------------------ > > Key: HIVE-11742 > URL: https://issues.apache.org/jira/browse/HIVE-11742 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing > Reporter: Prateek Rungta > Assignee: Prateek Rungta > > [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the > partitioning behavior of the last_value function. For a specified > last_value() OVER X. The ordering spec within X is used in addition to the > partition spec for partitioning. i.e. last_value(a) OVER (PARTITION BY i > ORDER BY j) operates last_value(a) on all rows within the unique combination > of (i,j). The behavior I'd expect is for PARTITION BY i to define the > partitioning, and ORDER BY to define the ordering within the PARTITION. i.e. > last_value(a) OVER (PARTITION BY i ORDER BY j) should operate last_value(a) > on all rows within the unique values of (i), ordered by j within the > partition. > This was changed to be consistent with how SQLServer handled such queries. > [SQLServer > Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396] > describe their example (which performs as Hive does): > {quote} > The PARTITION BY clause partitions the employees by department and the > LAST_VALUE function is applied to each partition independently. The ORDER BY > clause specified in the OVER clause determines the logical order in which the > LAST_VALUE function is applied to the rows in each partition. > {quote} > To me, their behavior is inconsistent with their description. I've filled an > [upstream > bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with > Microsoft for the same. > [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions] > and > [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html] > both exhibit the behavior I'd expect. > Considering Hive-4262 has been in core for 2+ years, I don't think we can > change the behavior without potentially impacting clients. But I would like a > way to enable the expected behavior at the least (behind a config flag > maybe?). What do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)