Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/ViewDev" page has been changed by JohnSichi. http://wiki.apache.org/hadoop/Hive/ViewDev?action=diff&rev1=6&rev2=7 -------------------------------------------------- * add queryable view support at the SQL language level (specifics of the scoping are under discussion in the Issues section below) * updatable views will not be supported - * make sure views show up anywhere tables can currently be enumerated/searched + * make sure views and their definitions show up anywhere tables can currently be enumerated/searched/described * where relevant, provide additional metadata to allow views to be distinguished from tables Beyond this, we may want to @@ -35, +35 @@ There are similar issues with other kinds of references in the view definition; for example, if a table or function name can be qualified, then the reference should be bound at the time the view is created. Implementing this typically requires expanding the view definition into an explicit form rather than storing the original view definition text directly. Doing this could require adding "unparse" support to the AST model (to be applied after object name resolution takes place), something which is not currently present (and which is also useful to have available in general). + + However, storing both the expanded form and the original view definition text as well can also be useful for both DESCRIBE readability as well as functionality (see later section on ALTER VIEW v RECOMPILE). == Metastore Modeling == @@ -81, +83 @@ However, if later we want to introduce persistent functions, or track column dependencies, this model will be insufficient, and we may need to introduce inheritance, with a DependencyParticipant base class from which tables, columns, functions etc all derive. (Again, need to verify that JDO inheritance will actually support what we want here.) + '''Update 30-Dec-2009''': Based on a design review meeting, we'll start with the bare-minimum MySQL approach (with no metastore support for dependency tracking), then if time allows, add dependency analysis and storage, followed by CASCADE support. + == Dependency Invalidation == What happens when an object is modified underneath a view? For example, suppose a view references a table's column, and then ALTER TABLE is used to drop or replace that column. Note that if the column's datatype changes, the view definition may remain meaningful, but the view's schema may need to be updated to match. Here are two possible options: @@ -89, +93 @@ * '''Lenient''': allow the update to proceed (and maybe warn the user of the impact), potentially leaving the view in an invalid state. Later, when an invalid view definition is referenced, throw a validation exception for the referencing query. This is the approach taken by MySQL. In the case of datatype changes, derived column datatypes already stored in metastore for referencing views would become stale until those views were recreated. Note that besides table modifications, other operations such as CREATE OR REPLACE VIEW have similar issues (since views can reference other views). The lenient approach provides a reasonable solution for the related issue of external tables whose schemas may be dynamic (not sure if we currently support this). + + '''Update 30-Dec-2009''': Based on a design review meeting, we'll start with the lenient approach, without any support for marking objects invalid in the metastore, then follow up with strict support and possibly metastore support for tracking object validity. == View Modification == @@ -99, +105 @@ Note that supporting view modification requires detection of cyclic view definitions, which should be invalid. Whether this detection is carried out at the time of view modification versus reference is dependent on the strict versus lenient approaches to dependency invalidation described above. + '''Update 30-Dec-2009''': Based on a design review meeting, we'll start with an Oracle-style ALTER VIEW v RECOMPILE, which can be used to revalidate a view definition, as well as to re-expand the original definition for clauses such as select *. Then if time allows, we'll follow up with CREATE OR REPLACE VIEW support. (The latter is less important since we're going with the lenient invalidation model, making DROP and re-CREATE possible without having to deal with downstream dependencies.) + == Fast Path Execution == For '''select * from t''', hive supports fast-path execution (skipping Map/Reduce). Is it important for this to work for '''select * from v''' as well? + + '''Update 30-Dec-2009''': Based on feedback in JIRA, we'll leave this as dependent on getting the fast-path working for the underlying filters and projections. == ORDER BY and LIMIT in view definition == SQL:200n prohibits ORDER BY in a view definition, since a view is supposed to be a virtual (unordered) table, not a query alias. However, many DBMS's ignore this rule; for example, MySQL allows ORDER BY, but ignores it in the case where it is superceded by an ORDER BY in the query. Should we prevent ORDER BY? This question also applies to the LIMIT clause. + '''Update 30-Dec-2009''': Based on feedback in JIRA, ORDER BY is important as forward-looking to materialized views. LIMIT may be less important, but we should probably support it too for consistency. +
