[I] Support different Spark internal Timestamp and Date types [hudi]

via GitHub Sat, 29 Nov 2025 20:36:46 -0800


hudi-bot opened a new issue, #14947:
URL: https://github.com/apache/hudi/issues/14947


   In Spark 3 a configuration was added, 
{{spark.sql.datetime.java8API.enabled}} which can modify the internal Row type 
of Timestamp and Date types to *Instant* or {*}LocalDate{*}. 
   
   https://issues.apache.org/jira/browse/SPARK-27008
   
   In Spark 3.1 this is enabled by default through spark-sql which will break 
writes using Timestamps. It's also likely this could be enabled by default in 
future across all Spark in which this would become a breaking issue
   
   Right now in AvroConversionHelper 
([ref|https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L301-L304])
 and SqlKeyGenerator 
([ref|https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/SqlKeyGenerator.scala])
 it cannot handle this properly.
   
   When partitioned by Timestamp
   {code:java}
   Caused by: java.lang.IllegalArgumentException: Invalid format: 
"2021-05-07T00:00:00Z" is malformed at "T00:00:00Z" at 
org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
 at 
org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) 
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.$anonfun$convertPartitionPathToSqlType$1(SqlKeyGenerator.scala:94)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.convertPartitionPathToSqlTy
 pe(SqlKeyGenerator.scala:85) at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.getPartitionPath(SqlKeyGenerator.scala:115)
 at 
org.apache.spark.sql.UDFRegistration.$anonfun$register$352(UDFRegistration.scala:777){code}
   Inserts with type Timestamp
   {code:java}
   21/10/21 18:14:17 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) 
(ip-10-71-235-164.ec2.internal executor 20): java.lang.ClassCastException: 
java.time.Instant cannot be cast to java.sql.Timestamp at 
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:304)
 at 
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:304)
 at scala.Option.map(Option.scala:230) at 
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$7(AvroConversionHelper.scala:304)
 at 
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$15(AvroConversionHelper.scala:362)
 at 
org.apache.hudi.HoodieSparkUtils$.$anonfun$createRddInternal$3(HoodieSparkUtils.scala:138)
    {code}
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-2972
   - Type: Improvement
   
   
   ---
   
   
   ## Comments
   
   13/Jan/22 22:23;shivnarayan;CC [~rxu] [[email protected]] 
   
    ;;;
   
   ---
   
   05/Feb/22 09:00;[email protected];[~ryanpife] can you retry by hudi 
master branch which includes this 
[HUDI-3125|https://github.com/apache/hudi/pull/4471];;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support different Spark internal Timestamp and Date types [hudi]

Reply via email to