nguyetlt1012 commented on issue #13834: URL: https://github.com/apache/hudi/issues/13834#issuecomment-3283420084
I already found why it returned "Compact failed". The detailed error is: ``` org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.WRITE_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set "spark.sql.parquet.datetimeRebaseModeInWrite" to "LEGACY" to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set the config to "CORRECTED" to write the datetime values as it is, if you are sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar. ``` Then I also tried to run `HoodieCompactor `with configs passed through spark-submit, but it did not work: ``` --conf 'spark.sql.parquet.datetimeRebaseModeInRead=LEGACY' \ --conf 'spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY' \ \ --conf 'spark.sql.parquet.int96RebaseModeInWrite=LEGACY' \ --conf 'spark.sql.parquet.int96RebaseModeInRead=LEGACY' \ --conf 'spark.sql.avro.datetimeRebaseModeInRead=LEGACY' \ --conf 'spark.sql.avro.datetimeRebaseModeInWrite=LEGACY' \ --class org.apache.hudi.utilities.HoodieCompactor \ ``` Finally, I switched to using `spark-sql` to run compaction, and it worked well. Thank you @rangareddy for your support -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
