[ https://issues.apache.org/jira/browse/HIVE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799459#comment-13799459 ]
Prasanth J commented on HIVE-5483: ---------------------------------- [~ashutoshc] In scenarios where metastore column stats are not available, do you think we can fallback to file format and see if it exposes column level statistics? ORC reader provides interface for column statistics. To make it more generic, I think we can add new interface like StatsProvidingRecordReader, implementation of which should expose file/column statistics. We can fallback to this record reader in case if the metastore stats are not available or stale. Since there are two sources of truth (file and metastore), there are two possibilities. 1) Check metastore, if column stats are not available fallback to file format. 2) Keep metastore as the only source of truth and make sure its always consistent with the underlying file format. (currently we don't make sure this is always consistent). Another thing that can be fixed is, there are some redundancies in computing stats with file format vs analyze command. If file format gathers file level and column level statistics, then analyze command should get it from the file format instead of computing it which is way cheaper. > use metastore statistics to optimize max/min/etc. queries > --------------------------------------------------------- > > Key: HIVE-5483 > URL: https://issues.apache.org/jira/browse/HIVE-5483 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Ashutosh Chauhan > Attachments: HIVE-5483.patch > > > We have discussed this a little bit. > Hive can answer queries such as select max(c1) from t purely from metastore > using partition statistics, provided that we know the statistics are up to > date. > All data changes (e.g. adding new partitions) currently go thru metastore so > we can track up-to-date-ness. If they are not up-to-date, the queries will > have to read data (at least for outdated partitions) until someone runs > analyze table. We can also analyze new partitions after add, if that is > configured/specified in the command. -- This message was sent by Atlassian JIRA (v6.1#6144)