[ 
https://issues.apache.org/jira/browse/HIVE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802986#comment-13802986
 ] 

Ashutosh Chauhan commented on HIVE-5483:
----------------------------------------

Fair points, Prashanth. I think option 2) is better because of two reasons. 
First, not all file formats have this capability, so tying these kind of 
optimization with a particular format should be avoided whenever possible. 
Secondly, we anyway would want to have stats fresh as much as possible in 
metastore for query planning purposes, so we are already down the path of 
making stats fresh. By the way, there is already a way to collect stats fast 
without full scan, for RC (via HIVE-3958 ). We can do same for ORC via HIVE-4177

I also agree we need to streamline our stats collection, stats storage and 
stats access api.

> use metastore statistics to optimize max/min/etc. queries
> ---------------------------------------------------------
>
>                 Key: HIVE-5483
>                 URL: https://issues.apache.org/jira/browse/HIVE-5483
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-5483.patch
>
>
> We have discussed this a little bit.
> Hive can answer queries such as select max(c1) from t purely from metastore 
> using partition statistics, provided that we know the statistics are up to 
> date.
> All data changes (e.g. adding new partitions) currently go thru metastore so 
> we can track up-to-date-ness. If they are not up-to-date, the queries will 
> have to read data (at least for outdated partitions) until someone runs 
> analyze table. We can also analyze new partitions after add, if that is 
> configured/specified in the command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to