Repository: spark Updated Branches: refs/heads/master af9789a4f -> 489845f3a
[SPARK-18145] Update documentation for hive partition management in 2.1 ## What changes were proposed in this pull request? This documents the partition handling changes for Spark 2.1 and how to migrate existing tables. ## How was this patch tested? Built docs locally. rxin Author: Eric Liang <[email protected]> Closes #16074 from ericl/spark-18145. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/489845f3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/489845f3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/489845f3 Branch: refs/heads/master Commit: 489845f3a0e2a3555b96b6f3dbb984c783b20d97 Parents: af9789a Author: Eric Liang <[email protected]> Authored: Tue Nov 29 20:06:39 2016 -0800 Committer: Reynold Xin <[email protected]> Committed: Tue Nov 29 20:06:39 2016 -0800 ---------------------------------------------------------------------- docs/sql-programming-guide.md | 9 +++++++++ 1 file changed, 9 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/489845f3/docs/sql-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 3adbe23..c7ad06c 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1331,6 +1331,15 @@ options. # Migration Guide +## Upgrading From Spark SQL 2.0 to 2.1 + + - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API. + - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance. + - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table. + - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables. + - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten. + - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data. + ## Upgrading From Spark SQL 1.6 to 2.0 - `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
