[
https://issues.apache.org/jira/browse/HIVE-20757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vineet Garg resolved HIVE-20757.
--------------------------------
Resolution: Duplicate
> Autogather stats doesn't work when SDPO (sort dynamic partition optimization)
> is ON
> -----------------------------------------------------------------------------------
>
> Key: HIVE-20757
> URL: https://issues.apache.org/jira/browse/HIVE-20757
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Affects Versions: 4.0.0
> Reporter: Vineet Garg
> Priority: Major
>
> *Reproducer*
> {code:sql}
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.stats.autogather=true;
> create table t11(i int, j int) partitioned by (s string);
> insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');
> hive> desc formatted t11 j;
> OK
> col_name j
> data_type int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bitVector
> comment from deserializer
> COLUMN_STATS_ACCURATE {}
> {code}
> {code:sql}
> hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5,
> 'p2'),(6,9,'p3');
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: _dummy_table
> Row Limit Per Split: 1
> Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE
> Column stats: COMPLETE
> Select Operator
> expressions: array(const struct(3,4,'p1'),const
> struct(4,5,'p2'),const struct(6,9,'p3')) (type:
> array<struct<col1:int,col2:int,col3:string>>)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 64 Basic stats:
> COMPLETE Column stats: COMPLETE
> UDTF Operator
> Statistics: Num rows: 1 Data size: 64 Basic stats:
> COMPLETE Column stats: COMPLETE
> function name: inline
> Select Operator
> expressions: col1 (type: int), col2 (type: int), col3
> (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col2 (type: string)
> sort order: +
> Map-reduce partition columns: _col2 (type: string)
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: int), _col1 (type:
> int)
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
> Select Operator
> expressions: VALUE._col0 (type: int), VALUE._col1 (type:
> int), KEY._col2 (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> File Output Operator
> compressed: false
> Dp Sort State: PARTITION_SORTED
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.t11
> Stage: Stage-2
> Dependency Collection
> Stage: Stage-0
> Move Operator
> tables:
> partition:
> s
> replace: false
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.t11
> Stage: Stage-3
> Stats Work
> Basic Stats Work:
> Column Stats Desc:
> Columns: i, j
> Column Types: int, int
> Table: default.t11
> {code}
> Notice that explain plan has autogather stats branch missing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)