[jira] [Resolved] (HIVE-20757) Autogather stats doesn't work when SDPO (sort dynamic partition optimization) is ON

Vineet Garg (JIRA) Mon, 03 Dec 2018 14:02:35 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-20757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vineet Garg resolved HIVE-20757.
--------------------------------
    Resolution: Duplicate

> Autogather stats doesn't work when SDPO (sort dynamic partition optimization) 
> is ON
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-20757
>                 URL: https://issues.apache.org/jira/browse/HIVE-20757
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 4.0.0
>            Reporter: Vineet Garg
>            Priority: Major
>
> *Reproducer*
> {code:sql}
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.stats.autogather=true;
> create table t11(i int, j int) partitioned by (s string);
> insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');
> hive> desc formatted t11 j;
> OK
> col_name              j
> data_type             int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bitVector
> comment               from deserializer
> COLUMN_STATS_ACCURATE {}
> {code}
> {code:sql}
> hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5, 
> 'p2'),(6,9,'p3');
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: _dummy_table
>                   Row Limit Per Split: 1
>                   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: array(const struct(3,4,'p1'),const 
> struct(4,5,'p2'),const struct(6,9,'p3')) (type: 
> array<struct<col1:int,col2:int,col3:string>>)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 1 Data size: 64 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     UDTF Operator
>                       Statistics: Num rows: 1 Data size: 64 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       function name: inline
>                       Select Operator
>                         expressions: col1 (type: int), col2 (type: int), col3 
> (type: string)
>                         outputColumnNames: _col0, _col1, _col2
>                         Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col2 (type: string)
>                           sort order: +
>                           Map-reduce partition columns: _col2 (type: string)
>                           Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                           value expressions: _col0 (type: int), _col1 (type: 
> int)
>         Reducer 2
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: VALUE._col0 (type: int), VALUE._col1 (type: 
> int), KEY._col2 (type: string)
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Dp Sort State: PARTITION_SORTED
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   table:
>                       input format: org.apache.hadoop.mapred.TextInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                       name: default.t11
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             s
>           replace: false
>           table:
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: default.t11
>   Stage: Stage-3
>     Stats Work
>       Basic Stats Work:
>       Column Stats Desc:
>           Columns: i, j
>           Column Types: int, int
>           Table: default.t11
> {code}
> Notice that explain plan has autogather stats branch missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HIVE-20757) Autogather stats doesn't work when SDPO (sort dynamic partition optimization) is ON

Reply via email to