Mathieu1124 opened a new pull request #1920:
URL: https://github.com/apache/hudi/pull/1920


   …n using TimestampBasedKeyGenerator
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Fix unable to parse input partition field :1 exception when using 
TimestampBasedKeyGenerator *
   
   ## Brief change log
   
   scene to reproduce:
   
       1. use TimestampBasedKeyGenerator
       2. set hoodie.deltastreamer.keygen.timebased.timestamp.type = DATE_STRING
       3. partitionpath field value is null
   
   when partitionpath field value is null TimestampBasedKeyGenerator will set 
it to1L, which can not be parsed correctly.
   `User class threw exception: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, prod-t3-data-lake-007, executor 6): 
org.apache.hudi.exception.HoodieDeltaStreamerException: Unable to parse input 
partition field :1
    at 
org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:156)
    at 
org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:108)
    at 
org.apache.hudi.keygen.CustomKeyGenerator.getKey(CustomKeyGenerator.java:78)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$9fce03f0$1(DeltaSync.java:343)
    at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at scala.collection.AbstractIterator.to(Iterator.scala:1334)
    at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
    at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
    at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
    at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
    at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: 
hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit is not 
specified but scalar it supplied as time value
    at 
org.apache.hudi.keygen.TimestampBasedKeyGenerator.convertLongTimeToMillis(TimestampBasedKeyGenerator.java:163)
    at 
org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:138)
    ... 29 more`
   
   
   ## Verify this pull request
   
   This pull request is already covered by 
org.apache.hudi.keygen.TestTimestampBasedKeyGenerator#testTimestampBasedKeyGenerator*.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to