[
https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-18209.
---------------------------------
Resolution: Fixed
> More robust view canonicalization without full SQL expansion
> ------------------------------------------------------------
>
> Key: SPARK-18209
> URL: https://issues.apache.org/jira/browse/SPARK-18209
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Reynold Xin
> Priority: Critical
>
> Spark SQL currently stores views by analyzing the provided SQL and then
> generating fully expanded SQL out of the analyzed logical plan. This is
> actually a very error prone way of doing it, because:
> 1. It is non-trivial to guarantee that the generated SQL is correct without
> being extremely verbose, given the current set of operators.
> 2. We need extensive testing for all combination of operators.
> 3. Whenever we introduce a new logical plan operator, we need to be super
> careful because it might break SQL generation. This is the main reason
> broadcast join hint has taken forever to be merged because it is very
> difficult to guarantee correctness.
> Given the two primary reasons to do view canonicalization is to provide the
> context for the database as well as star expansion, I think we can this
> through a simpler approach, by taking the user given SQL, analyze it, and
> just wrap the original SQL with a SELECT clause at the outer and store the
> database as a hint.
> For example, given the following view creation SQL:
> {code}
> USE DATABASE my_db;
> CREATE TABLE my_table (id int, name string);
> CREATE VIEW my_view AS SELECT * FROM my_table WHERE id > 10;
> {code}
> We store the following SQL instead:
> {code}
> SELECT /*+ current_db: `my_db` */ id, name FROM (SELECT * FROM my_table WHERE
> id > 10);
> {code}
> During parsing time, we expand the view along using the provided database
> context.
> (We don't need to follow exactly the same hint, as I'm merely illustrating
> the high level approach here.)
> Note that there is a chance that the underlying base table(s)' schema change
> and the stored schema of the view might differ from the actual SQL schema. In
> that case, I think we should throw an exception at runtime to warn users.
> This exception can be controlled by a flag.
> Update 1: based on the discussion below, we don't even need to put the view
> definition in a sub query. We can just add it via a logical plan at the end.
> Update 2: we should make sure permanent views do not depend on temporary
> objects (views, tables, or functions).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]