[spark] branch branch-3.0 updated: [SPARK-27528][FOLLOWUP] improve migration guide

wenchen Wed, 19 Feb 2020 06:49:11 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 23b4554  [SPARK-27528][FOLLOWUP] improve migration guide
23b4554 is described below

commit 23b4554dadc7807bc6da9ef4a89fb7cf29e155f1
Author: Wenchen Fan <wenc...@databricks.com>
AuthorDate: Wed Feb 19 22:26:56 2020 +0800

    [SPARK-27528][FOLLOWUP] improve migration guide
    
    ### What changes were proposed in this pull request?
    
    mention that `INT96` timestamp is still useful for interoperability.
    
    ### Why are the changes needed?
    
    Give users more context of the behavior changes.
    
    ### Does this PR introduce any user-facing change?
    
    no
    
    ### How was this patch tested?
    
    N/A
    
    Closes #27622 from cloud-fan/parquet.
    
    Authored-by: Wenchen Fan <wenc...@databricks.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
    (cherry picked from commit c7bece354132eef3677004bd796f82ef72f85bd1)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 docs/sql-migration-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 76df66b..0690127 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -87,7 +87,7 @@ license: |
 
   - In Spark version 2.4, when a spark session is created via 
`cloneSession()`, the newly created spark session inherits its configuration 
from its parent `SparkContext` even though the same configuration may exist 
with a different value in its parent spark session. Since Spark 3.0, the 
configurations of a parent `SparkSession` have a higher precedence over the 
parent `SparkContext`. The old behavior can be restored by setting 
`spark.sql.legacy.sessionInitWithConfigDefaults` to `true`.
 
-  - Since Spark 3.0, parquet logical type `TIMESTAMP_MICROS` is used by 
default while saving `TIMESTAMP` columns. In Spark version 2.4 and earlier, 
`TIMESTAMP` columns are saved as `INT96` in parquet files. To set `INT96` to 
`spark.sql.parquet.outputTimestampType` restores the previous behavior.
+  - Since Spark 3.0, parquet logical type `TIMESTAMP_MICROS` is used by 
default while saving `TIMESTAMP` columns. In Spark version 2.4 and earlier, 
`TIMESTAMP` columns are saved as `INT96` in parquet files. Note that, some SQL 
systems such as Hive 1.x and Impala 2.x can only read `INT96` timestamps, you 
can set `spark.sql.parquet.outputTimestampType` as `INT96` to restore the 
previous behavior and keep interoperability.
 
   - Since Spark 3.0, if `hive.default.fileformat` is not found in `Spark SQL 
configuration` then it will fallback to hive-site.xml present in the `Hadoop 
configuration` of `SparkContext`.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-27528][FOLLOWUP] improve migration guide

Reply via email to