Xiao Li created SPARK-17581: ------------------------------- Summary: Invalidate Statistics After Some ALTER TABLE Commands Key: SPARK-17581 URL: https://issues.apache.org/jira/browse/SPARK-17581 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Xiao Li
In the recent statistics-related work, our focus is on how to generate and store the statistics. After `Analyze Table` commands, the statistics will not be changed unless users run the command again. However, Hive behaves differently. For example, `ALTER TABLE SET LOCATION` will invalidate the statistics, including `numRows` and `rawDataSize`. {noformat} hive> describe formatted t2; ... Location: hdfs://6b68a24121f4:9000/user/hive/warehouse/t2 Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles 4 numRows 2 rawDataSize 2 totalSize 4 transient_lastDdlTime 1464590855 ... {noformat} {noformat} hive> alter table t2 set location 'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1'; OK Time taken: 0.113 seconds {noformat} {noformat} hive> describe formatted t2; ... Location: hdfs://6b68a24121f4:9000/user/hive/warehouse/t1 Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE false last_modified_by root last_modified_time 1474178025 numFiles 4 numRows -1 rawDataSize -1 totalSize 4 ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org