[ https://issues.apache.org/jira/browse/SPARK-17581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15500397#comment-15500397 ]
Apache Spark commented on SPARK-17581: -------------------------------------- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/15136 > Invalidate Statistics After Some ALTER TABLE Commands > ----------------------------------------------------- > > Key: SPARK-17581 > URL: https://issues.apache.org/jira/browse/SPARK-17581 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: Xiao Li > > In the recent statistics-related work, our focus is on how to generate and > store the statistics. After `Analyze Table` commands, the statistics will not > be changed unless users run the command again. However, Hive behaves > differently. For example, `ALTER TABLE SET LOCATION` will invalidate the > statistics, including `numRows` and `rawDataSize`. > {noformat} > hive> describe formatted t2; > ... > Location: hdfs://6b68a24121f4:9000/user/hive/warehouse/t2 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE true > numFiles 4 > numRows 2 > rawDataSize 2 > totalSize 4 > transient_lastDdlTime 1464590855 > ... > {noformat} > {noformat} > hive> alter table t2 set location > 'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1'; > OK > Time taken: 0.113 seconds > {noformat} > {noformat} > hive> describe formatted t2; > ... > Location: hdfs://6b68a24121f4:9000/user/hive/warehouse/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_by root > last_modified_time 1474178025 > numFiles 4 > numRows -1 > rawDataSize -1 > totalSize 4 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org