Mathieu1124 opened a new pull request #1920: URL: https://github.com/apache/hudi/pull/1920
…n using TimestampBasedKeyGenerator ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator * ## Brief change log scene to reproduce: 1. use TimestampBasedKeyGenerator 2. set hoodie.deltastreamer.keygen.timebased.timestamp.type = DATE_STRING 3. partitionpath field value is null when partitionpath field value is null TimestampBasedKeyGenerator will set it to1L, which can not be parsed correctly. `User class threw exception: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, prod-t3-data-lake-007, executor 6): org.apache.hudi.exception.HoodieDeltaStreamerException: Unable to parse input partition field :1 at org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:156) at org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:108) at org.apache.hudi.keygen.CustomKeyGenerator.getKey(CustomKeyGenerator.java:78) at org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$9fce03f0$1(DeltaSync.java:343) at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$10.next(Iterator.scala:394) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit is not specified but scalar it supplied as time value at org.apache.hudi.keygen.TimestampBasedKeyGenerator.convertLongTimeToMillis(TimestampBasedKeyGenerator.java:163) at org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:138) ... 29 more` ## Verify this pull request This pull request is already covered by org.apache.hudi.keygen.TestTimestampBasedKeyGenerator#testTimestampBasedKeyGenerator*. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
