Virmaline commented on issue #6278:
URL: https://github.com/apache/hudi/issues/6278#issuecomment-1353791597

   @alexeykudinkin
   
   Hey Alexey, 
   
   I'm also still getting the same error after updating to 0.12.1.
   
   Hudi: 0.12.1-amzn-0-SNAPSHOT
   Spark: 3.3.0
   EMR: 6.9.0
   
   `spark-submit 
   --master yarn 
   --deploy-mode cluster 
   --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer,spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED,spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED,spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED,spark.sql.avro.datetimeRebaseModeInRead=CORRECTED,spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED,spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED,spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED,spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED
 
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/usr/lib/hudi/hudi-utilities-bundle.jar 
   --table-type COPY_ON_WRITE 
   --source-ordering-field replicadmstimestamp 
   --source-class org.apache.hudi.utilities.sources.ParquetDFSSource 
   --target-base-path s3://bucket/folder/folder/table 
   --target-table table 
   --payload-class org.apache.hudi.common.model.AWSDmsAvroPayload 
   --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
 
   --hoodie-conf 
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING 
   --hoodie-conf 
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM 
   --hoodie-conf 
"hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd 
HH:mm:ss.SSSSSS" 
   --hoodie-conf hoodie.datasource.write.recordkey.field=_id 
   --hoodie-conf 
hoodie.datasource.write.partitionpath.field=replicadmstimestamp 
   --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3://bucket/folder/folder/table`
   
   I've tried about the every combination of the datetimeRebaseMode I've 
managed to think of, and the result is always the same.
   
   stacktrace included, is there any possible workaround for this? I currently 
have a separate process to change the timestamp columns, which works, but adds 
a bunch of overhead to the process. 
   
   
[stacktrace.txt](https://github.com/apache/hudi/files/10241150/stacktrace.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to