Yin Huai created HIVE-4809:
------------------------------
Summary: ReduceSinkOperator of PTFOperator can have redundant key
columns
Key: HIVE-4809
URL: https://issues.apache.org/jira/browse/HIVE-4809
Project: Hive
Issue Type: Improvement
Reporter: Yin Huai
For example, we have a simple query like this ...
{code:sql}
SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
{\code}
The plan of it is ...
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
x
TableScan
alias: x
Reduce Output Operator
key expressions:
expr: a
type: int
expr: a
type: int
sort order: ++
Map-reduce partition columns:
expr: a
type: int
tag: -1
value expressions:
expr: a
type: int
expr: b
type: string
Reduce Operator Tree:
Extract
PTF Operator
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: string
expr: _wcol0
type: bigint
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{\code}
The ReduceSinkOperator has two "a" in its key columns. This redundancy can
increase the size of map output.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira