[ 
https://issues.apache.org/jira/browse/SPARK-17581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15500397#comment-15500397
 ] 

Apache Spark commented on SPARK-17581:
--------------------------------------

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/15136

> Invalidate Statistics After Some ALTER TABLE Commands
> -----------------------------------------------------
>
>                 Key: SPARK-17581
>                 URL: https://issues.apache.org/jira/browse/SPARK-17581
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Xiao Li
>
> In the recent statistics-related work, our focus is on how to generate and 
> store the statistics. After `Analyze Table` commands, the statistics will not 
> be changed unless users run the command again. However, Hive behaves 
> differently. For example, `ALTER TABLE SET LOCATION` will invalidate the 
> statistics, including `numRows` and `rawDataSize`.
> {noformat}
> hive> describe formatted t2;
> ...
> Location:             hdfs://6b68a24121f4:9000/user/hive/warehouse/t2  
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   true                
>       numFiles                4                   
>       numRows                 2                   
>       rawDataSize             2                   
>       totalSize               4                   
>       transient_lastDdlTime   1464590855          
> ...
> {noformat}
> {noformat}
> hive> alter table t2 set location 
> 'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1';
> OK
> Time taken: 0.113 seconds
> {noformat}
> {noformat}
> hive> describe formatted t2;
> ...                    
> Location:             hdfs://6b68a24121f4:9000/user/hive/warehouse/t1  
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   false               
>       last_modified_by        root                
>       last_modified_time      1474178025          
>       numFiles                4                   
>       numRows                 -1                  
>       rawDataSize             -1                  
>       totalSize               4                   
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to