conor garvey gelvin created HIVE-7501:
-----------------------------------------
Summary: Automatic Aggregations in Partitioned Tables
Key: HIVE-7501
URL: https://issues.apache.org/jira/browse/HIVE-7501
Project: Hive
Issue Type: Improvement
Components: Database/Schema
Reporter: conor garvey gelvin
Aggregations are considered fundamental to OLAP systems as they provide a large
speedup necessary for real world applications of databases. The number of
aggregations such as count, sum, max among others, is proportional to the
product of all aggregatable dimensions in a table, and therefore requires an
unfeasible amount of time to compute in their entirety. Memory constraints are
also a consideration to keep the subset small. Selecting the subset that is to
be computed and saved for future use manually is also not entirely acceptable
for modern systems as doing so is a trivial task that any user could do using a
simple HiveQL command.
An automatic way to compute the aggregations is therefore desirable. Proposal:
In a partitioned table, results of built-in hive aggregated functions of a
partition are saved in a table for each partition after the user asks once for
that aggregated data.
This provides a mechanism for overnight aggregations where the user can simply
compute the aggregation once overnight and then in the day time use the
aggregated data for data mining automatically.
Critique, suggestions and development welcome.
--
This message was sent by Atlassian JIRA
(v6.2#6252)