[
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830719#comment-15830719
]
Alexander Behm edited comment on HIVE-15653 at 1/19/17 10:28 PM:
-----------------------------------------------------------------
[~ctang.ma] Impala calls the Metastore API alter_table(). I tried the following
alterations in Impala and those did wipe the table stats:
ALTER TABLE ADD COLUMNS
ALTER TABLE CHANGE COLUMN
ALTER TABLE SET TBLPROPERTIES
ALTER TABLE SET SERDEPROPERTIES
ALTER TABLE SET LOCATION
ALTER TABLE SET FILEFORMAT
ALTER TABLE SET CACHED
So I would say most ALTER commands do wipe the stats (from Impala). Just trying
to make sure the fix on the Hive side is complete, i.e. the alter_table() API
call on the Metastore is fixed and not just the Hive DDL commands.
The ALTER TABLE RENAME command worked fine (preserved table stats).
was (Author: alex.behm):
[~ctang.ma] Impala calls the Metastore API alter_table(). I tried the following
alterations and those did wipe the table stats:
ALTER TABLE ADD COLUMNS
ALTER TABLE CHANGE COLUMN
ALTER TABLE SET TBLPROPERTIES
ALTER TABLE SET SERDEPROPERTIES
ALTER TABLE SET LOCATION
ALTER TABLE SET FILEFORMAT
ALTER TABLE SET CACHED
So I would say most ALTER commands do wipe the stats. Just trying to make sure
the fix on the Hive side is complete (i.e. the alter_table() API call on the
Metastore).
The ALTER TABLE RENAME command worked fine (preserved table stats).
> Some ALTER TABLE commands drop table stats
> ------------------------------------------
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 1.1.0
> Reporter: Alexander Behm
> Assignee: Chaoyu Tang
> Priority: Critical
> Attachments: HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some
> ALTER TABLE operations, but certainly not for others. Personally, I I think
> ALTER TABLE should only change what was requested by the user without any
> side effects that may be unclear to users. In particular, collecting stats
> can be an expensive operation so it's rather inconvenient for users if they
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_name data_type comment
>
> i int
>
> # Detailed Table Information
> Database: default
> Owner: abehm
> CreateTime: Tue Jan 17 18:13:34 PST 2017
> LastAccessTime: UNKNOWN
> Protect Mode: None
> Retention: 0
> Location: hdfs://localhost:20500/test-warehouse/t
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE false
> last_modified_by abehm
> last_modified_time 1484705748
> numFiles 1
> numRows -1
> rawDataSize -1
> test test
> totalSize 2
> transient_lastDdlTime 1484705748
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat: org.apache.hadoop.mapred.TextInputFormat
> OutputFormat:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed: No
> Num Buckets: -1
> Bucket Columns: []
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)