Harish Butani created HIVE-10586:
------------------------------------
Summary: Plans for Queries with Select distinct and Windowing are
incorrect
Key: HIVE-10586
URL: https://issues.apache.org/jira/browse/HIVE-10586
Project: Hive
Issue Type: Bug
Components: PTF-Windowing, Query Planning
Reporter: Harish Butani
Thanks to [~yhuai] for pointing this out.
The Plan generated has the GBy Operator(for the Select Distinct) placed below
the PTFOp. One would expect the Select Distinct to happen last. [~yhuai]
confirmed this behavior in postgres. I think this paragraph in the SQL spec
states this order(though I am not an expert in deciphering the language in the
spec; if an expert on the spec wants to pipe in, please do):
{noformat}
Point h) on Page 222, in the 2011 SQL Spec, seems to state this:
h) Case:
i) If OF is simply contained in a <query specification> QSX, then QSX is
equivalent to:
SELECT SQ SLNEW TENEW
{noformat}
Here is an example from windowing.q
{noformat}
35. testDistinctWithWindowing
select DISTINCT p_mfgr, p_name, p_size,
sum(p_size) over w1 as s
from part
window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding and
2 following)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)