spark git commit: [SPARK-25797][SQL][DOCS][BACKPORT-2.3] Add migration doc for solving issues caused by view canonicalization approach change

dongjoon Sun, 28 Oct 2018 21:27:57 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 5b1396596 -> 17d882adf



[SPARK-25797][SQL][DOCS][BACKPORT-2.3] Add migration doc for solving issues 
caused by view canonicalization approach change

## What changes were proposed in this pull request?
Since Spark 2.2, view definitions are stored in a different way from prior 
versions. This may cause Spark unable to read views created by prior versions. 
See [SPARK-25797](https://issues.apache.org/jira/browse/SPARK-25797) for more 
details.

Basically, we have 2 options.
1) Make Spark 2.2+ able to get older view definitions back. Since the expanded 
text is buggy and unusable, we have to use original text (this is possible with 
[SPARK-25459](https://issues.apache.org/jira/browse/SPARK-25459)). However, 
because older Spark versions don't save the context for the database, we cannot 
always get correct view definitions without view default database.
2) Recreate the views by `ALTER VIEW AS` or `CREATE OR REPLACE VIEW AS`.

This PR aims to add migration doc to help users troubleshoot this issue by 
above option 2.

## How was this patch tested?
N/A.

Docs are generated and checked locally

```
cd docs
SKIP_API=1 jekyll serve --watch
```

Closes #22851 from seancxmao/SPARK-25797-2.3.

Authored-by: seancxmao <seancx...@gmail.com>
Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
(cherry picked from commit 3e0160bacfbe4597f15ca410ca832617cdeeddca)
Signed-off-by: Dongjoon Hyun <dongj...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17d882ad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17d882ad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17d882ad

Branch: refs/heads/branch-2.2
Commit: 17d882adf0b1bbbd4350b6d46756fab0fd602683
Parents: 5b13965
Author: seancxmao <seancx...@gmail.com>
Authored: Sun Oct 28 21:27:22 2018 -0700
Committer: Dongjoon Hyun <dongj...@apache.org>
Committed: Sun Oct 28 21:27:42 2018 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/17d882ad/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 8cd4d05..758920e 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1548,6 +1548,8 @@ options.
 
   - Spark 2.1.1 introduced a new configuration key: 
`spark.sql.hive.caseSensitiveInferenceMode`. It had a default setting of 
`NEVER_INFER`, which kept behavior identical to 2.1.0. However, Spark 2.2.0 
changes this setting's default value to `INFER_AND_SAVE` to restore 
compatibility with reading Hive metastore tables whose underlying file schema 
have mixed-case column names. With the `INFER_AND_SAVE` configuration value, on 
first access Spark will perform schema inference on any Hive metastore table 
for which it has not already saved an inferred schema. Note that schema 
inference can be a very time consuming operation for tables with thousands of 
partitions. If compatibility with mixed-case column names is not a concern, you 
can safely set `spark.sql.hive.caseSensitiveInferenceMode` to `NEVER_INFER` to 
avoid the initial overhead of schema inference. Note that with the new default 
`INFER_AND_SAVE` setting, the results of the schema inference are saved as a 
metastore key for future use
 . Therefore, the initial schema inference occurs only at a table's first 
access.
 
+  - Since Spark 2.2, view definitions are stored in a different way from prior 
versions. This may cause Spark unable to read views created by prior versions. 
In such cases, you need to recreate the views using `ALTER VIEW AS` or `CREATE 
OR REPLACE VIEW AS` with newer Spark versions.
+
 ## Upgrading From Spark SQL 2.0 to 2.1
 
  - Datasource tables now store partition metadata in the Hive metastore. This 
means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now 
available for tables created with the Datasource API.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25797][SQL][DOCS][BACKPORT-2.3] Add migration doc for solving issues caused by view canonicalization approach change

Reply via email to