[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4809:
-----------------------------------

    Affects Version/s: 0.11.0
    
> ReduceSinkOperator of PTFOperator can have redundant key columns
> ----------------------------------------------------------------
>
>                 Key: HIVE-4809
>                 URL: https://issues.apache.org/jira/browse/HIVE-4809
>             Project: Hive
>          Issue Type: Improvement
>          Components: PTF-Windowing
>    Affects Versions: 0.11.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         x 
>           TableScan
>             alias: x
>             Reduce Output Operator
>               key expressions:
>                     expr: a
>                     type: int
>                     expr: a
>                     type: int
>               sort order: ++
>               Map-reduce partition columns:
>                     expr: a
>                     type: int
>               tag: -1
>               value expressions:
>                     expr: a
>                     type: int
>                     expr: b
>                     type: string
>       Reduce Operator Tree:
>         Extract
>           PTF Operator
>             Select Operator
>               expressions:
>                     expr: _col0
>                     type: int
>                     expr: _col1
>                     type: string
>                     expr: _wcol0
>                     type: bigint
>               outputColumnNames: _col0, _col1, _col2
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundancy can 
> increase the size of map output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to