[
https://issues.apache.org/jira/browse/HUDI-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487436#comment-17487436
]
Yann Byron commented on HUDI-2972:
----------------------------------
[~ryanpife] can you retry by hudi master branch which include this
[HUDI-3125|https://github.com/apache/hudi/pull/4471]
> Support different Spark internal Timestamp and Date types
> ---------------------------------------------------------
>
> Key: HUDI-2972
> URL: https://issues.apache.org/jira/browse/HUDI-2972
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark-sql
> Reporter: Ryan Pifer
> Priority: Critical
>
> In Spark 3 a configuration was added, {{spark.sql.datetime.java8API.enabled}}
> which can modify the internal Row type of Timestamp and Date types to
> *Instant* or {*}LocalDate{*}.
> https://issues.apache.org/jira/browse/SPARK-27008
> In Spark 3.1 this is enabled by default through spark-sql which will break
> writes using Timestamps. It's also likely this could be enabled by default in
> future across all Spark in which this would become a breaking issue
> Right now in AvroConversionHelper
> ([ref|https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L301-L304])
> and SqlKeyGenerator
> ([ref|https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/SqlKeyGenerator.scala])
> it cannot handle this properly.
> When partitioned by Timestamp
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Invalid format:
> "2021-05-07T00:00:00Z" is malformed at "T00:00:00Z" at
> org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
> at
> org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
> at
> org.apache.spark.sql.hudi.command.SqlKeyGenerator.$anonfun$convertPartitionPathToSqlType$1(SqlKeyGenerator.scala:94)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
> org.apache.spark.sql.hudi.command.SqlKeyGenerator.convertPartitionPathToSqlType(SqlKeyGenerator.scala:85)
> at
> org.apache.spark.sql.hudi.command.SqlKeyGenerator.getPartitionPath(SqlKeyGenerator.scala:115)
> at
> org.apache.spark.sql.UDFRegistration.$anonfun$register$352(UDFRegistration.scala:777){code}
> Inserts with type Timestamp
> {code:java}
> 21/10/21 18:14:17 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2)
> (ip-10-71-235-164.ec2.internal executor 20): java.lang.ClassCastException:
> java.time.Instant cannot be cast to java.sql.Timestamp at
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:304)
> at
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:304)
> at scala.Option.map(Option.scala:230) at
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$7(AvroConversionHelper.scala:304)
> at
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$15(AvroConversionHelper.scala:362)
> at
> org.apache.hudi.HoodieSparkUtils$.$anonfun$createRddInternal$3(HoodieSparkUtils.scala:138)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)