[ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847450#comment-15847450
 ] 

Chaoyu Tang commented on HIVE-15653:
------------------------------------

[~alex.behm] The patch provided here is mainly to fix some Hive alter table 
DDLs and ensure them not to drop table stats accidentally. There was not much 
change in HMS. However, HIVE-12730 introduced some new HMS APIs (e.g. void 
alter_table_with_environmentContext) which allow user to pass a flag 
StatsSetupConst.DO_NOT_UPDATE_STATS true to HMS and indicate it not to update 
the stats. These new APIs could probably be used in Impala. BTW, the 
StatsSetupConst.STATS_GENERATED_VIA_STATS_TASK has been dropped in HIVE-12730.

> Some ALTER TABLE commands drop table stats
> ------------------------------------------
>
>                 Key: HIVE-15653
>                 URL: https://issues.apache.org/jira/browse/HIVE-15653
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.1.0
>            Reporter: Alexander Behm
>            Assignee: Chaoyu Tang
>            Priority: Critical
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15653.1.patch, HIVE-15653.2.patch, 
> HIVE-15653.3.patch, HIVE-15653.4.patch, HIVE-15653.5.patch, 
> HIVE-15653.6.patch, HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_name                    data_type               comment             
>                
> i                     int                                         
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                abehm                    
> CreateTime:           Tue Jan 17 18:13:34 PST 2017     
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://localhost:20500/test-warehouse/t  
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   false               
>       last_modified_by        abehm               
>       last_modified_time      1484705748          
>       numFiles                1                   
>       numRows                 -1                  
>       rawDataSize             -1                  
>       test                    test                
>       totalSize               2                   
>       transient_lastDdlTime   1484705748          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
>  
> InputFormat:          org.apache.hadoop.mapred.TextInputFormat         
> OutputFormat:         
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to