chrischnweiss opened a new issue #4946:
URL: https://github.com/apache/hudi/issues/4946
**Describe the problem you faced**
Hey guys,
actually I am trying to use DeltaStreamer along with a CustomKeyGenerator to
use ComplexKeyGenerator and TimeBasedKeyGenerator together. Now I am struggling
with DateFormat Exception and I can“t figure out why. Maybe you can help me?
These are my config properties for the partitoning:
```
hoodie.datasource.write.recordkey.field=id_1,id_2
hoodie.datasource.write.partitionpath.field=timestamp_col:timestamp
hoodie.datasource.write.hive_style_partitioning=true
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
hoodie.deltastreamer.keygen.timebased.input.dateformat="yyyy-MM-dd
HH:mm:ss.SSS Z"
hoodie.deltastreamer.keygen.timebased.output.dateformat="yyyy/MM/dd"
```
**Expected behavior**
I want to parse my date_string as input to partition the target dataset.
My timestamp_col looks like this: `2022-01-13 16:57:05.659 +01:00`
**Environment Description**
Hudi version :
0.10.0
Spark version :
3.1.1
Hive version :
3.1.3000
Hadoop version :
3.1.1
Storage (HDFS/S3/GCS..) :
HDFS
Running on Docker? (yes/no) :
no
**Stacktrace**
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 2.0 (TID 11) (t-worker.node.asdasd.de executor 1):
org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to parse input
partition field :2022-01-13 16:57:05.659 +01:00
at
org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:135)
at
org.apache.hudi.keygen.CustomAvroKeyGenerator.getPartitionPath(CustomAvroKeyGenerator.java:89)
at
org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:68)
at
org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:62)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$d62e16$1(DeltaSync.java:453)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Invalid format: "2022-01-13
16:57:05.659 +01:00"
at
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:945)
at
org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:202)
at
org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:133)
... 30 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]