Harish Butani created HIVE-10586: ------------------------------------ Summary: Plans for Queries with Select distinct and Windowing are incorrect Key: HIVE-10586 URL: https://issues.apache.org/jira/browse/HIVE-10586 Project: Hive Issue Type: Bug Components: PTF-Windowing, Query Planning Reporter: Harish Butani
Thanks to [~yhuai] for pointing this out. The Plan generated has the GBy Operator(for the Select Distinct) placed below the PTFOp. One would expect the Select Distinct to happen last. [~yhuai] confirmed this behavior in postgres. I think this paragraph in the SQL spec states this order(though I am not an expert in deciphering the language in the spec; if an expert on the spec wants to pipe in, please do): {noformat} Point h) on Page 222, in the 2011 SQL Spec, seems to state this: h) Case: i) If OF is simply contained in a <query specification> QSX, then QSX is equivalent to: SELECT SQ SLNEW TENEW {noformat} Here is an example from windowing.q {noformat} 35. testDistinctWithWindowing select DISTINCT p_mfgr, p_name, p_size, sum(p_size) over w1 as s from part window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding and 2 following) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)