Xiao Li created SPARK-17581:
-------------------------------

             Summary: Invalidate Statistics After Some ALTER TABLE Commands
                 Key: SPARK-17581
                 URL: https://issues.apache.org/jira/browse/SPARK-17581
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Xiao Li


In the recent statistics-related work, our focus is on how to generate and 
store the statistics. After `Analyze Table` commands, the statistics will not 
be changed unless users run the command again. However, Hive behaves 
differently. For example, `ALTER TABLE SET LOCATION` will invalidate the 
statistics, including `numRows` and `rawDataSize`.
{noformat}
hive> describe formatted t2;
...
Location:               hdfs://6b68a24121f4:9000/user/hive/warehouse/t2  
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        numFiles                4                   
        numRows                 2                   
        rawDataSize             2                   
        totalSize               4                   
        transient_lastDdlTime   1464590855          
...
{noformat}
{noformat}
hive> alter table t2 set location 
'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1';
OK
Time taken: 0.113 seconds
{noformat}
{noformat}
hive> describe formatted t2;
...                      
Location:               hdfs://6b68a24121f4:9000/user/hive/warehouse/t1  
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   false               
        last_modified_by        root                
        last_modified_time      1474178025          
        numFiles                4                   
        numRows                 -1                  
        rawDataSize             -1                  
        totalSize               4                   
...
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to