Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/ViewDev" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/ViewDev?action=diff&rev1=1&rev2=2

--------------------------------------------------

  
  At a minimum, we want to 
  
- * add queryable view support at the SQL language level (specifics of the 
scoping are under discussion in the Issues section below)
+  * add queryable view support at the SQL language level (specifics of the 
scoping are under discussion in the Issues section below)
- ** updatable views will not be supported
+   * updatable views will not be supported
- * make sure views show up anywhere tables can currently be enumerated/searched
+  * make sure views show up anywhere tables can currently be 
enumerated/searched
- * where relevant, provide additional metadata to allow views to be 
distinguished from tables
+  * where relevant, provide additional metadata to allow views to be 
distinguished from tables
  
  Beyond this, we may want to
  
- * expose metadata about view definitions and dependencies (at table-level or 
column-level) in a way that makes them consumable by metadata-driven tools
+  * expose metadata about view definitions and dependencies (at table-level or 
column-level) in a way that makes them consumable by metadata-driven tools
  
  = Implementation Sketch =
  
  The basics of view implementation are very easy due to the fact that Hive 
already supports subselects in the FROM clause.
  
- * For <b>CREATE VIEW v AS view-def-select</b>, we extend SemanticAnalyzer to 
behave similarly to <b>CREATE TABLE t AS select</b>, except that we don't 
actually execute the query (we stop after plan generation).  It's necessary to 
perform all of plan generation (even though we're not actually going to execute 
the plan) since currently some validations such as type compatibility-checking 
are only performed during plan generation.  After successful validation, the 
text of the view is saved in the metastore (the simplest approach snips out the 
text from the parser's token stream, but this approach introduces problems 
described in the issues section below).
+  * For '''CREATE VIEW v AS view-def-select''', we extend SemanticAnalyzer to 
behave similarly to '''CREATE TABLE t AS select''', except that we don't 
actually execute the query (we stop after plan generation).  It's necessary to 
perform all of plan generation (even though we're not actually going to execute 
the plan) since currently some validations such as type compatibility-checking 
are only performed during plan generation.  After successful validation, the 
text of the view is saved in the metastore (the simplest approach snips out the 
text from the parser's token stream, but this approach introduces problems 
described in the issues section below).
- * For <b>select ... from view-reference</b>, we detect the view reference in 
SemanticAnalyzer.getMetaData, load the text of its definition from the 
metastore, parse it back into an AST, prepare a QBExpr to hold it, and then 
plug this into the referencing query's QB, resulting in a tree equivalent to 
<b>select ... from (view-def-select)</b>; plan generation can then be carried 
out on the combined tree.
+  * For '''select ... from view-reference''', we detect the view reference in 
SemanticAnalyzer.getMetaData, load the text of its definition from the 
metastore, parse it back into an AST, prepare a QBExpr to hold it, and then 
plug this into the referencing query's QB, resulting in a tree equivalent to 
'''select ... from (view-def-select)'''; plan generation can then be carried 
out on the combined tree.
  
  = Issues =
  
@@ -56, +56 @@

  
  Comparison of the two approaches:
  
+ || || Inheritance Model || Flat Model ||
- {| border="1"
- |-
- ! 
- ! Inheritance Model
- ! Flat Model
- |-
- ! JDO Support
- | Need to investigate how well inheritance works for our purposes
+ || JDO Support || Need to investigate how well inheritance works for our 
purposes || Nothing special ||
+ || Metadata queries from existing code/tools || Existing queries for tables 
will NOT include views in results; those that need to will have to be modified 
to reference base class instead || Existing queries for tables WILL include 
views in results; those that are not supposed to will need to filter them out ||
+ || Metastore upgrade on deployment || Need to test carefully to make sure 
introducing inheritance doesn't corrupt existing metastore instances || Nothing 
special, just adding a new attribute ||
- | Nothing special
- |-
- ! Metadata queries from existing code/tools
- | Existing queries for tables will NOT include views in results; those that 
need to will have to be modified to reference base class instead
- | Existing queries for tables WILL include views in results; those that are 
not supposed to will need to filter them out
- |-
- ! Metastore upgrade on deployment
- | Need to test carefully to make sure introducing inheritance doesn't corrupt 
existing metastore instances
- | Nothing special, just adding a new attribute
- |}
  
  == Dependency Tracking ==
  
  It's necessary to track dependencies from a view to objects it references in 
the metastore:
  
- * tables:  this is mandatory if we want DROP TABLE to be able to correctly 
CASCADE/RESTRICT to a referencing view
+  * tables:  this is mandatory if we want DROP TABLE to be able to correctly 
CASCADE/RESTRICT to a referencing view
- * other views:  same as tables
+  * other views:  same as tables
- * columns:  this is optional (useful for lineage inspection, but not required 
for implementing SQL features)
+  * columns:  this is optional (useful for lineage inspection, but not 
required for implementing SQL features)
- * temporary functions:  we should disallow these at view creation unless we 
also want a concept of temporary view (or if it's OK for the referencing view 
to become invalid whenever the volatile function registry gets cleared)
+  * temporary functions:  we should disallow these at view creation unless we 
also want a concept of temporary view (or if it's OK for the referencing view 
to become invalid whenever the volatile function registry gets cleared)
- * any other objects? (e.g. udt's coming in as part of 
[http://issues.apache.org/jira/browse/HIVE-779 HIVE-779])
+  * any other objects? (e.g. udt's coming in as part of 
[http://issues.apache.org/jira/browse/HIVE-779 HIVE-779])
  
  (Note that MySQL doesn't actually implement CASCADE/RESTRICT:  it just 
ignores the keyword and drops the table unconditionally, leaving the view 
dangling.)
  
@@ -97, +83 @@

  
  What happens when an object is modified underneath a view?  For example, 
suppose a view references a table's column, and then ALTER TABLE is used to 
drop or replace that column.  Note that if the column's datatype changes, the 
view definition may remain meaningful, but the view's schema may need to be 
updated to match.  Here are two possible options:
  
- * <b>Strict</b>:  prevent operations which would invalidate or change the 
view in any way (and optionally to provide a CASCADE flag which requests that 
such views be dropped automatically).  This is the approach taken by SQL:200n.
+  * '''Strict''':  prevent operations which would invalidate or change the 
view in any way (and optionally to provide a CASCADE flag which requests that 
such views be dropped automatically).  This is the approach taken by SQL:200n.
- * <b>Lenient</b>: allow the update to proceed (and maybe warn the user of the 
impact), potentially leaving the view in an invalid state.  Later, when an 
invalid view definition is referenced, throw a validation exception for the 
referencing query.  This is the approach taken by MySQL.  In the case of 
datatype changes, derived column datatypes already stored in metastore for 
referencing views would become stale until those views were recreated.
+  * '''Lenient''': allow the update to proceed (and maybe warn the user of the 
impact), potentially leaving the view in an invalid state.  Later, when an 
invalid view definition is referenced, throw a validation exception for the 
referencing query.  This is the approach taken by MySQL.  In the case of 
datatype changes, derived column datatypes already stored in metastore for 
referencing views would become stale until those views were recreated.
  
  Note that besides table modifications, other operations such as CREATE OR 
REPLACE VIEW have similar issues (since views can reference other views).  The 
lenient approach provides a reasonable solution for the related issue of 
external tables whose schemas may be dynamic (not sure if we currently support 
this).
  
@@ -106, +92 @@

  
  In SQL:200n, there's no standard way to update a view definition.  MySQL 
supports both
  
- * <b>CREATE OR REPLACE VIEW v AS new-view-def-select</b>
+  * '''CREATE OR REPLACE VIEW v AS new-view-def-select'''
- * <b>ALTER VIEW v AS new-view-def-select</b>
+  * '''ALTER VIEW v AS new-view-def-select'''
  
  Note that supporting view modification requires detection of cyclic view 
definitions, which should be invalid.  Whether this detection is carried out at 
the time of view modification versus reference is dependent on the strict 
versus lenient approaches to dependency invalidation described above.
  
  == Fast Path Execution ==
  
- For <b>select * from t</b>, hive supports fast-path execution (skipping 
Map/Reduce).  Is it important for this to work for <b>select * from v</b> as 
well?
+ For '''select * from t''', hive supports fast-path execution (skipping 
Map/Reduce).  Is it important for this to work for '''select * from v''' as 
well?
  
  == ORDER BY and LIMIT in view definition ==
  

Reply via email to