nguyetlt1012 commented on issue #13834:
URL: https://github.com/apache/hudi/issues/13834#issuecomment-3283420084

   I already found why it returned "Compact failed". The detailed error is:
   ```
   org.apache.spark.SparkUpgradeException: 
[INCONSISTENT_BEHAVIOR_CROSS_VERSION.WRITE_ANCIENT_DATETIME] You may get a 
different result due to the upgrading to Spark >= 3.0:
   writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z
   into Parquet files can be dangerous, as the files may be read by Spark 2.x
   or legacy versions of Hive later, which uses a legacy hybrid calendar that
   is different from Spark 3.0+'s Proleptic Gregorian calendar. See more
   details in SPARK-31404. You can set 
"spark.sql.parquet.datetimeRebaseModeInWrite" to "LEGACY" to rebase the
   datetime values w.r.t. the calendar difference during writing, to get maximum
   interoperability. Or set the config to "CORRECTED" to write the datetime
   values as it is, if you are sure that the written files will only be read by
   Spark 3.0+ or other systems that use Proleptic Gregorian calendar.
   ```
   Then I also tried to run `HoodieCompactor `with configs passed through 
spark-submit, but it did not work:
   ```
   --conf 'spark.sql.parquet.datetimeRebaseModeInRead=LEGACY' \
   --conf 'spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY' \ \
   --conf 'spark.sql.parquet.int96RebaseModeInWrite=LEGACY' \
   --conf 'spark.sql.parquet.int96RebaseModeInRead=LEGACY' \
   --conf 'spark.sql.avro.datetimeRebaseModeInRead=LEGACY' \
   --conf 'spark.sql.avro.datetimeRebaseModeInWrite=LEGACY' \
   --class org.apache.hudi.utilities.HoodieCompactor \
   ```
   
   Finally, I switched to using `spark-sql` to run compaction, and it worked 
well.
   
   Thank you @rangareddy for your support
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to