[
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989269#comment-15989269
]
Chaoyu Tang commented on HIVE-16147:
------------------------------------
[~pxiong] Thanks for looking into this. Yeah, I made some changes to fix the
test failures and also optimized the code a little. I have uploaded the 2nd
patch to RB requesting for the review.
> Rename a partitioned table should not drop its partition columns stats
> ----------------------------------------------------------------------
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
> Issue Type: Bug
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to
> sample_pt_rename), describing its partition shows that the partition column
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS
> for all columns are true
> {code}
> ...
> # Detailed Partition Information
> Partition Value: [3]
> Database: default
> Table: sample_pt
> CreateTime: Fri Jan 20 15:42:30 EST 2017
> LastAccessTime: UNKNOWN
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:
> COLUMN_STATS_ACCURATE
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
> last_modified_by ctang
> last_modified_time 1485217063
> numFiles 1
> numRows 100
> rawDataSize 5143
> totalSize 5243
> transient_lastDdlTime 1488842358
> ...
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column
> stats exists
> {code}
> # col_name data_type min
> max num_nulls distinct_count
> avg_col_len max_col_len num_trues
> num_falses comment
>
>
> salary int 1 151370
> 0 94
>
> from deserializer
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3):
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for
> columns are still true.
> {code}
> # Detailed Partition Information
> Partition Value: [3]
> Database: default
> Table: sample_pt_rename
> CreateTime: Fri Jan 20 15:42:30 EST 2017
> LastAccessTime: UNKNOWN
> Location:
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:
> COLUMN_STATS_ACCURATE
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
> last_modified_by ctang
> last_modified_time 1485217063
> numFiles 1
> numRows 100
> rawDataSize 5143
> totalSize 5243
> transient_lastDdlTime 1488842358
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the
> column stats have been dropped.
> {code}
> # col_name data_type comment
>
>
>
> salary int from deserializer
>
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)