[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108714#comment-14108714 ]
Lefty Leverenz commented on HIVE-7654: -------------------------------------- The attached document is good but the question is whether to put that information in the wiki or just leave it as an attachment to this JIRA ticket. If extrapolation is only used behind-the-scenes, it probably doesn't need to go in the wiki. Or do users need to know about it so they can understand the statistics? In that case it would go in the Statistics doc, where this JIRA ticket is already listed in the Current Status section: * [Statistics in Hive | https://cwiki.apache.org/confluence/display/Hive/StatsDev] * [Statistics in Hive -- Current Status (JIRA) | https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-CurrentStatus(JIRA)] > A method to extrapolate columnStats for partitions of a table > ------------------------------------------------------------- > > Key: HIVE-7654 > URL: https://issues.apache.org/jira/browse/HIVE-7654 > Project: Hive > Issue Type: New Feature > Components: Statistics > Reporter: pengcheng xiong > Assignee: pengcheng xiong > Priority: Minor > Fix For: 0.14.0 > > Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, > HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, > HIVE-7654.8.patch, HIVE-7654.9.patch > > > In a PARTITIONED table, there are many partitions. For example, > create table if not exists loc_orc ( > state string, > locid int, > zip bigint > ) partitioned by(year string) stored as orc; > We assume there are 4 partitions, partition(year='2000'), > partition(year='2001'), partition(year='2002') and partition(year='2003'). > We can use the following command to compute statistics for columns > state,locid of partition(year='2001') > analyze table loc_orc partition(year='2001') compute statistics for columns > state,locid; > We need to know the “aggregated” column status for the whole table loc_orc. > However, we may not have the column status for some partitions, e.g., > partition(year='2002') and also we may not have the column status for some > columns, e.g., zip bigint for partition(year='2001') > We propose a method to extrapolate the missing column status for the > partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)