[Hadoop Wiki] Update of "Hive/PartitionedViews" by JohnSichi

Apache Wiki Tue, 01 Feb 2011 11:59:13 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/PartitionedViews" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/PartitionedViews?action=diff&rev1=1&rev2=2

--------------------------------------------------

  
  = Use Cases =
  
- * An administrator wants to create a set of views as a table/column renaming 
layer on top of an existing set of base tables, without disturbing the ETL 
processes which load those tables.  To read-only users, the views should behave 
exactly the same as the underlying tables in every way.  Among other things, 
this means users should be able to browse available partitions.
+  1. An administrator wants to create a set of views as a table/column 
renaming layer on top of an existing set of base tables, without disturbing the 
ETL processes which load those tables.  To read-only users, the views should 
behave exactly the same as the underlying tables in every way.  Among other 
things, this means users should be able to browse available partitions.
- * A base table is partitioned on columns (ds,hr) for date and hour.  Besides 
this fine-grained partitioning, users would also like to see a virtual table of 
coarse-grained (date-only) partitioning in which the partition for a given date 
only appears once all of the hour-level partitions of that day have been fully 
loaded.
+  1. A base table is partitioned on columns (ds,hr) for date and hour.  
Besides this fine-grained partitioning, users would also like to see a virtual 
table of coarse-grained (date-only) partitioning in which the partition for a 
given date only appears after all of the hour-level partitions of that day have 
been fully loaded.
- * A view is defined on a complex join+union+aggregation of a number of 
underlying base tables and other views, all of which are themselves 
partitioned.  The top-level view should also be partitioned accordingly, with a 
new partition not appearing until corresponding partitions have been loaded for 
all of the underlying tables.
+  1. A view is defined on a complex join+union+aggregation of a number of 
underlying base tables and other views, all of which are themselves 
partitioned.  The top-level view should also be partitioned accordingly, with a 
new partition not appearing until corresponding partitions have been loaded for 
all of the underlying tables.
  
+ = Approaches =
+ 
+  1. One possible approach mentioned in 
[[https://issues.apache.org/jira/browse/HIVE-1079|HIVE-1079]] is to infer view 
partitions automatically based on the partitions of the underlying tables.  A 
command such as SHOW PARTITIONS could then synthesize virtual partition 
descriptors on the fly.  This is fairly easy to do for use case #1, but 
potentially very difficult for use cases #2 and #3.  So for now, we are punting 
on this approach.
+  1. Instead, we will require users to explicitly declare view partitioning as 
part of CREATE VIEW, and explicitly manage partition metadata via ALTER VIEW 
{ADD|DROP} PARTITION.  This allows all of the use cases to be satisfied (while 
placing more burden on the user, and taking up more metastore space).
+

[Hadoop Wiki] Update of "Hive/PartitionedViews" by JohnSichi

Reply via email to