Repository: spark Updated Branches: refs/heads/branch-2.1 eb0b3631d -> 55b1142bd
[SPARK-18145] Update documentation for hive partition management in 2.1 ## What changes were proposed in this pull request? This documents the partition handling changes for Spark 2.1 and how to migrate existing tables. ## How was this patch tested? Built docs locally. rxin Author: Eric Liang <[email protected]> Closes #16074 from ericl/spark-18145. (cherry picked from commit 489845f3a0e2a3555b96b6f3dbb984c783b20d97) Signed-off-by: Reynold Xin <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55b1142b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55b1142b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55b1142b Branch: refs/heads/branch-2.1 Commit: 55b1142bdbdcb9005e384a99ff5dffd3ae24216b Parents: eb0b363 Author: Eric Liang <[email protected]> Authored: Tue Nov 29 20:06:39 2016 -0800 Committer: Reynold Xin <[email protected]> Committed: Tue Nov 29 20:06:45 2016 -0800 ---------------------------------------------------------------------- docs/sql-programming-guide.md | 9 +++++++++ 1 file changed, 9 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/55b1142b/docs/sql-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 3093d48..51ba911 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1320,6 +1320,15 @@ options. # Migration Guide +## Upgrading From Spark SQL 2.0 to 2.1 + + - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API. + - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance. + - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table. + - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables. + - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten. + - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data. + ## Upgrading From Spark SQL 1.6 to 2.0 - `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
