hudi-bot opened a new issue, #14947:
URL: https://github.com/apache/hudi/issues/14947
In Spark 3 a configuration was added,
{{spark.sql.datetime.java8API.enabled}} which can modify the internal Row type
of Timestamp and Date types to *Instant* or {*}LocalDate{*}.
https://issues.apache.org/jira/browse/SPARK-27008
In Spark 3.1 this is enabled by default through spark-sql which will break
writes using Timestamps. It's also likely this could be enabled by default in
future across all Spark in which this would become a breaking issue
Right now in AvroConversionHelper
([ref|https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L301-L304])
and SqlKeyGenerator
([ref|https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/SqlKeyGenerator.scala])
it cannot handle this properly.
When partitioned by Timestamp
{code:java}
Caused by: java.lang.IllegalArgumentException: Invalid format:
"2021-05-07T00:00:00Z" is malformed at "T00:00:00Z" at
org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at
org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at
org.apache.spark.sql.hudi.command.SqlKeyGenerator.$anonfun$convertPartitionPathToSqlType$1(SqlKeyGenerator.scala:94)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
scala.collection.TraversableLike.map(TraversableLike.scala:238) at
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
org.apache.spark.sql.hudi.command.SqlKeyGenerator.convertPartitionPathToSqlTy
pe(SqlKeyGenerator.scala:85) at
org.apache.spark.sql.hudi.command.SqlKeyGenerator.getPartitionPath(SqlKeyGenerator.scala:115)
at
org.apache.spark.sql.UDFRegistration.$anonfun$register$352(UDFRegistration.scala:777){code}
Inserts with type Timestamp
{code:java}
21/10/21 18:14:17 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2)
(ip-10-71-235-164.ec2.internal executor 20): java.lang.ClassCastException:
java.time.Instant cannot be cast to java.sql.Timestamp at
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:304)
at
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:304)
at scala.Option.map(Option.scala:230) at
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$7(AvroConversionHelper.scala:304)
at
org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$15(AvroConversionHelper.scala:362)
at
org.apache.hudi.HoodieSparkUtils$.$anonfun$createRddInternal$3(HoodieSparkUtils.scala:138)
{code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-2972
- Type: Improvement
---
## Comments
13/Jan/22 22:23;shivnarayan;CC [~rxu] [[email protected]]
;;;
---
05/Feb/22 09:00;[email protected];[~ryanpife] can you retry by hudi
master branch which includes this
[HUDI-3125|https://github.com/apache/hudi/pull/4471];;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]