This is an automated email from the ASF dual-hosted git repository.
awong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push:
new a0db990 [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
a0db990 is described below
commit a0db990e08173293e42a7490322f08681abaa5d3
Author: Andrew Wong <[email protected]>
AuthorDate: Sat Mar 20 21:04:43 2021 -0700
[backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
After the bump to Spark 3.1.1, TestKuduBackup.testRandomBackupAndRestore
started failing with errors like the following:
02:04:37.919 [ERROR - Executor task launch worker for task 0.0 in stage 0.0
(TID 0)] (Logging.scala:94) Aborting task
org.apache.spark.SparkUpgradeException: You may get a different result due
to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps
before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the
files may be read by Spark 2.x or legacy versions of Hive later, which uses a
legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian
calendar. See more details in SPARK-31404. You can set
spark.sql.legacy.parquet.int96RebaseModeInWrite [...]
at
org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInWrite(DataSourceUtils.scala:165)
~[spark-sql_2.12-3.1.1.jar:3.1.1]
...
Per their instructions, this sets the int96RebaseModeInWrite option.
Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed
Reviewed-on: http://gerrit.cloudera.org:8080/17213
Reviewed-by: Grant Henke <[email protected]>
Tested-by: Andrew Wong <[email protected]>
---
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala | 1 +
1 file changed, 1 insertion(+)
diff --git
a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
index c02f5de..13dcc5f 100644
--- a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
+++ b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
@@ -86,6 +86,7 @@ object KuduBackup {
// 1900-01-01T00:00:00Z in Parquet. Otherwise incorrect values may be read
by
// Spark 2 or legacy version of Hive. See more details in SPARK-31404.
session.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite",
"LEGACY")
+ session.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite",
"LEGACY")
// Write the data to the backup path.
// The backup path contains the timestampMs and should not already exist.