conor garvey gelvin created HIVE-7501: -----------------------------------------
Summary: Automatic Aggregations in Partitioned Tables Key: HIVE-7501 URL: https://issues.apache.org/jira/browse/HIVE-7501 Project: Hive Issue Type: Improvement Components: Database/Schema Reporter: conor garvey gelvin Aggregations are considered fundamental to OLAP systems as they provide a large speedup necessary for real world applications of databases. The number of aggregations such as count, sum, max among others, is proportional to the product of all aggregatable dimensions in a table, and therefore requires an unfeasible amount of time to compute in their entirety. Memory constraints are also a consideration to keep the subset small. Selecting the subset that is to be computed and saved for future use manually is also not entirely acceptable for modern systems as doing so is a trivial task that any user could do using a simple HiveQL command. An automatic way to compute the aggregations is therefore desirable. Proposal: In a partitioned table, results of built-in hive aggregated functions of a partition are saved in a table for each partition after the user asks once for that aggregated data. This provides a mechanism for overnight aggregations where the user can simply compute the aggregation once overnight and then in the day time use the aggregated data for data mining automatically. Critique, suggestions and development welcome. -- This message was sent by Atlassian JIRA (v6.2#6252)