(spark) branch master updated: [SPARK-50483][SPARK-50545][DOC][FOLLOWUP] Mention behavior changes in migration guide

wenchen Fri, 20 Dec 2024 07:31:35 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new bccdf1ffd467 [SPARK-50483][SPARK-50545][DOC][FOLLOWUP] Mention 
behavior changes in migration guide
bccdf1ffd467 is described below

commit bccdf1ffd467cb60ca6e100c20a1b659102eb304
Author: Cheng Pan <[email protected]>
AuthorDate: Fri Dec 20 23:23:33 2024 +0800

    [SPARK-50483][SPARK-50545][DOC][FOLLOWUP] Mention behavior changes in 
migration guide
    
    ### What changes were proposed in this pull request?
    
    Update migration guide for SPARK-50483 and SPARK-50545
    
    ### Why are the changes needed?
    
    Mention behavior changes in migration guide
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, docs are updated.
    
    ### How was this patch tested?
    
    Review.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #49252 from pan3793/SPARK-50483-SPARK-50545-followup.
    
    Authored-by: Cheng Pan <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 docs/core-migration-guide.md | 6 ++++++
 docs/sql-migration-guide.md  | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 958e442545dc..49737392312a 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -54,6 +54,12 @@ license: |
 
 - Since Spark 4.0, `spark.shuffle.unsafe.file.output.buffer` is deprecated 
though still works. Use `spark.shuffle.localDisk.file.output.buffer` instead.
 
+- Since Spark 4.0, when reading files hits 
`org.apache.hadoop.security.AccessControlException` and 
`org.apache.hadoop.hdfs.BlockMissingException`, the exception will be thrown 
and fail the task, even if `spark.files.ignoreCorruptFiles` is set to `true`.
+
+## Upgrading from Core 3.5.3 to 3.5.4
+
+- Since Spark 3.5.4, when reading files hits 
`org.apache.hadoop.security.AccessControlException` and 
`org.apache.hadoop.hdfs.BlockMissingException`, the exception will be thrown 
and fail the task, even if `spark.files.ignoreCorruptFiles` is set to `true`.
+
 ## Upgrading from Core 3.4 to 3.5
 
 - Since Spark 3.5, `spark.yarn.executor.failuresValidityInterval` is 
deprecated. Use `spark.executor.failuresValidityInterval` instead.
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 717d27befef0..254c54a414a7 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -29,6 +29,7 @@ license: |
 - Since Spark 4.0, the default behaviour when inserting elements in a map is 
changed to first normalize keys -0.0 to 0.0. The affected SQL functions are 
`create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To 
restore the previous behaviour, set 
`spark.sql.legacy.disableMapKeyNormalization` to `true`.
 - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is 
changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set 
`spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`).
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL 
configs 
`spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` 
instead of the core config 
`spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
+- Since Spark 4.0, when reading SQL tables hits 
`org.apache.hadoop.security.AccessControlException` and 
`org.apache.hadoop.hdfs.BlockMissingException`, the exception will be thrown 
and fail the task, even if `spark.sql.files.ignoreCorruptFiles` is set to 
`true`.
 - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior 
to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users 
should migrate to higher versions.
 - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of 
codec name `lz4raw`, please use `lz4_raw` instead.
 - Since Spark 4.0, when overflowing during casting timestamp to byte/short/int 
under non-ansi mode, Spark will return null instead a wrapping value.
@@ -63,6 +64,10 @@ license: |
 - Since Spark 4.0, The Storage-Partitioned Join feature flag 
`spark.sql.sources.v2.bucketing.pushPartValues.enabled` is set to `true`. To 
restore the previous behavior, set 
`spark.sql.sources.v2.bucketing.pushPartValues.enabled` to `false`.
 - Since Spark 4.0, the `sentences` function uses `Locale(language)` instead of 
`Locale.US` when `language` parameter is not `NULL` and `country` parameter is 
`NULL`.
 
+## Upgrading from Spark SQL 3.5.3 to 3.5.4
+
+- Since Spark 3.5.4, when reading SQL tables hits 
`org.apache.hadoop.security.AccessControlException` and 
`org.apache.hadoop.hdfs.BlockMissingException`, the exception will be thrown 
and fail the task, even if `spark.sql.files.ignoreCorruptFiles` is set to 
`true`.
+
 ## Upgrading from Spark SQL 3.5.1 to 3.5.2
 
 - Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, 
while in 3.5.1, it was wrongly read as ByteType.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50483][SPARK-50545][DOC][FOLLOWUP] Mention behavior changes in migration guide

Reply via email to