[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896634#action_12896634
 ] 

Ning Zhang commented on HIVE-1361:
----------------------------------

Ahmed has put up the design doc on the wiki: 
http://wiki.apache.org/hadoop/Hive/StatsDev.

Ahmed is also finalizing the patch for review. 

There are some minor changes from the original requirement: currently the stats 
gather are # of rows, total size in bytes, # files and # of partitions (for 
table). It does not have the min/max/avg of row/file sizes since they are 
different in the raw size (serialized and compressed) with the sizes we saw 
during stats gathering (deserialized and decompressed). And there are no strong 
use cases for them currently, so we'll exclude them for this patch. 

> table/partition level statistics
> --------------------------------
>
>                 Key: HIVE-1361
>                 URL: https://issues.apache.org/jira/browse/HIVE-1361
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ahmed M Aly
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to