[
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528903#comment-14528903
]
Chaoyu Tang commented on HIVE-9534:
-----------------------------------
Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g
Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is
not supported. Though the query "select avg(distinct col1) over() from
testwindow;", where avg acts on all rows, works, query with windowing_clause
does actually not work. for example:
{code}
Query:
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1
following) from testwindow;
---
Error:
ORA-30487: ORDER BY not allowed here
30487. 00000 - "ORDER BY not allowed here"
*Cause: DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
{code}
Based on Oracle document
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm
{code}
Note that the DISTINCT keyword is not supported in windowing functions except
for MAX and MIN.
{code}
Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it
looks like Hive was computing the distinct value of col1:
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: testwindow
Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: col1 (type: int)
outputColumnNames: col1
Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE
Column stats: NONE
Group By Operator
aggregations: avg(DISTINCT col1)
keys: col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE
Column stats: NONE
Reduce Operator Tree:
Group By Operator
aggregations: avg(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> incorrect result set for query that projects a windowed aggregate
> -----------------------------------------------------------------
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Reporter: N Campbell
> Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint
> create table if not exists TSINT (RNUM int , CSINT smallint)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)