[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870229#action_12870229 ]
Ning Zhang commented on HIVE-1362: ---------------------------------- This is the 2nd subtask of HIVE-33 (stats in Hive tables). We will gather column level stats based on users' request. It also depends on HIVE-1361 in that the metastore API should suport storing and retrieving stats. The major milestone for this subtasks are: 1) add a new HiveQL command to gather column level stats. Please see HIVE-33 for the syntax. 2) add new UDFs/UDAFs to compute these statistics. The proposed statistics are: - number of distinct values - number of NULL values - min/max k values where k could be given by user - histogram: frequency and height balanced - average size of the column - avg/sum of all values in the column if their type is numerical - percentiles of the value > column level statistics > ----------------------- > > Key: HIVE-1362 > URL: https://issues.apache.org/jira/browse/HIVE-1362 > Project: Hadoop Hive > Issue Type: Sub-task > Reporter: Ning Zhang > Assignee: Ahmed M Aly > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.