[
https://issues.apache.org/jira/browse/HIVE-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chaoyu Tang reassigned HIVE-16572:
----------------------------------
> Rename a partition should not drop its column stats
> ---------------------------------------------------
>
> Key: HIVE-16572
> URL: https://issues.apache.org/jira/browse/HIVE-16572
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
>
> The column stats for the table sample_pt partition (dummy=1) is as following:
> {code}
> hive> describe formatted sample_pt partition (dummy=1) code;
> OK
> # col_name data_type min
> max num_nulls distinct_count
> avg_col_len max_col_len num_trues
> num_falses comment
>
>
> code string
> 0 303 6.985
> 7
> from deserializer
> Time taken: 0.259 seconds, Fetched: 3 row(s)
> {code}
> But when this partition is renamed, say
> alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
> The COLUMN_STATS in partition description are true, but column stats are
> actually all deleted.
> {code}
> hive> describe formatted sample_pt partition (dummy=11);
> OK
> # col_name data_type comment
>
> code string
> description string
> salary int
> total_emp int
>
> # Partition Information
> # col_name data_type comment
>
> dummy int
>
> # Detailed Partition Information
> Partition Value: [11]
> Database: default
> Table: sample_pt
> CreateTime: Thu Mar 30 23:03:59 EDT 2017
> LastAccessTime: UNKNOWN
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11
>
> Partition Parameters:
> COLUMN_STATS_ACCURATE
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
> numFiles 1
> numRows 200
> rawDataSize 10228
> totalSize 10428
> transient_lastDdlTime 1490929439
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat: org.apache.hadoop.mapred.TextInputFormat
> OutputFormat:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed: No
> Num Buckets: -1
> Bucket Columns: []
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> Time taken: 6.783 seconds, Fetched: 37 row(s)
> ===
> hive> describe formatted sample_pt partition (dummy=11) code;
> OK
> # col_name data_type comment
>
>
>
> code string from deserializer
>
> Time taken: 9.429 seconds, Fetched: 3 row(s)
> {code}
> The column stats should not be drop when a partition is renamed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)