Repository: spark
Updated Branches:
  refs/heads/master af9789a4f -> 489845f3a


[SPARK-18145] Update documentation for hive partition management in 2.1

## What changes were proposed in this pull request?

This documents the partition handling changes for Spark 2.1 and how to migrate 
existing tables.

## How was this patch tested?

Built docs locally.

rxin

Author: Eric Liang <[email protected]>

Closes #16074 from ericl/spark-18145.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/489845f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/489845f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/489845f3

Branch: refs/heads/master
Commit: 489845f3a0e2a3555b96b6f3dbb984c783b20d97
Parents: af9789a
Author: Eric Liang <[email protected]>
Authored: Tue Nov 29 20:06:39 2016 -0800
Committer: Reynold Xin <[email protected]>
Committed: Tue Nov 29 20:06:39 2016 -0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/489845f3/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3adbe23..c7ad06c 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1331,6 +1331,15 @@ options.
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.0 to 2.1
+
+ - Datasource tables now store partition metadata in the Hive metastore. This 
means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now 
available for tables created with the Datasource API.
+    - Legacy datasource tables can be migrated to this format via the `MSCK 
REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage 
of Hive DDL support and improved planning performance.
+    - To determine if a table has been migrated, look for the 
`PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the 
table.
+ - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for 
Datasource tables.
+    - In prior Spark versions `INSERT OVERWRITE` overwrote the entire 
Datasource table, even when given a partition specification. Now only 
partitions matching the specification are overwritten.
+    - Note that this still differs from the behavior of Hive tables, which is 
to overwrite only partitions overlapping with newly inserted data.
+
 ## Upgrading From Spark SQL 1.6 to 2.0
 
  - `SparkSession` is now the new entry point of Spark that replaces the old 
`SQLContext` and


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to