extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics) ----------------------------------------------------------------------------------------------------------------------------
Key: HIVE-2185 URL: https://issues.apache.org/jira/browse/HIVE-2185 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers, Statistics Reporter: Tomasz Nykiel Assignee: Tomasz Nykiel Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system. Here, we want to collect information about the sizes of uncompressed data, to be able to determine the efficiency of compression. Currently, a large part of statistics collection mechanism is hardcoded and not-easily extensible for other statistics. On top of adding the new statistic collected, it would be desirable to extend the collection mechanism, so any new statistics could be added easily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira