extend table statistics to store the size of uncompressed data (+extend 
interfaces for collecting other types of statistics)
----------------------------------------------------------------------------------------------------------------------------

                 Key: HIVE-2185
                 URL: https://issues.apache.org/jira/browse/HIVE-2185
             Project: Hive
          Issue Type: New Feature
          Components: Serializers/Deserializers, Statistics
            Reporter: Tomasz Nykiel
            Assignee: Tomasz Nykiel


Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we 
collect statistics about the number of rows per partition/table. Other 
statistics (e.g., total table/partition size) are derived from the file system. 

Here, we want to collect information about the sizes of uncompressed data, to 
be able to determine the efficiency of compression.
Currently, a large part of statistics collection mechanism is hardcoded and 
not-easily extensible for other statistics.
On top of adding the new statistic collected, it would be desirable to extend 
the collection mechanism, so any new statistics could be added easily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to