Hi All,

I have a use-case where the data from the database is being pulled by Kafka
Connect using JDBC Connector, and the data is in Avro format. I am trying
to partition the table based on date, but the column is in long type. Below
is the avro schema for the column

        {
          "type": "long",
          "connect.version": 1,
          "connect.name": "org.apache.kafka.connect.data.Timestamp",
          "logicalType": "timestamp-millis"
        }

I have added the following configuration for DeltaStreamer

hoodie.datasource.write.partitionpath.field=datetime_column
hoodie.datasource.write.keygenerator.class=org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=UNIX_TIMESTAMP
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
hoodie.datasource.hive_sync.partition_fields=datetime_column

Data is getting written successfully, but it fails to sync partitions due
to data type mismatch of the column. Also, I
checked TimestampBasedKeyGenerator, it only supports timestamp in seconds.
So the timestamp generated is wrong.

Here's the error

19/11/01 15:00:52 ERROR yarn.ApplicationMaster: User class threw
exception: org.apache.hudi.hive.HoodieHiveSyncException: Failed to
sync partitions for table table_name
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
partitions for table table_name
  at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:172)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
  at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
  at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
  at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
  at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:120)
  at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in
executing SQL ALTER TABLE default.table_name ADD IF NOT EXISTS
PARTITION (`datetime_column`='11536-04-01') LOCATION
'hdfs:/tmp/hoodie/tables/table_name/11536/04/01'
  at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:476)
  at 
org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:139)
  at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:167)
  ... 12 more

Caused by: org.apache.hive.service.cli.HiveSQLException: Error while
compiling statement: FAILED: SemanticException [Error 10248]: Cannot
add partition column datetime_column of type string as it cannot be
converted to type bigint

  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
  at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:474)
  ... 14 more



Anyone who's faced this use-case before, and any thoughts on handling this?


Regards,

Gurudatt

Reply via email to