Hi All, I am using HoodieDeltaStreamer (hoodie-0.4.7) to migrate a small table. The data is being written successfully in parquet format but the hive sync fails.
Here's the Stacktrace. 19/10/14 17:02:12 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 19/10/14 17:02:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.ClassCastException: org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore cannot be cast to com.uber.hoodie.org.apache.hadoop_hive.metastore.PartitionExpressionProxy java.lang.ClassCastException: org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore cannot be cast to com.uber.hoodie.org.apache.hadoop_hive.metastore.PartitionExpressionProxy at com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.createExpressionProxy(ObjectStore.java:367) at com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.initialize(ObjectStore.java:345) at com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.setConf(ObjectStore.java:298) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at com.uber.hoodie.org.apache.hadoop_hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:60) at com.uber.hoodie.org.apache.hadoop_hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:682) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:660) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:709) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:508) at com.uber.hoodie.org.apache.hadoop_hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78) at com.uber.hoodie.org.apache.hadoop_hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6481) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:207) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102) at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61) at com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.syncHive(HoodieDeltaStreamer.java:328) at com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:298) at com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:469) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688) Here's are the properties that I am using, ``` hoodie.upsert.shuffle.parallelism=2 hoodie.insert.shuffle.parallelism=2 hoodie.bulkinsert.shuffle.parallelism=2 # Key fields, for kafka example hoodie.datasource.write.recordkey.field=<primary_key> *hoodie.datasource.write.partitionpath.field=* # schema provider configs hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/schema_name/versions/latest # Kafka Source hoodie.datasource.hive_sync.database=default hoodie.datasource.hive_sync.table=table_name hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000 *hoodie.datasource.hive_sync.partition_fields=* hoodie.deltastreamer.source.kafka.topic=topic_name #Kafka props metadata.broker.list=localhost:9092 auto.offset.reset=smallest schema.registry.url=http://localhost:8081 ``` The table does not have partitons, hence i have kept *hoodie.datasource.write.partitionpath.field *blank, so it is writing it to `default` directory. Also, *hoodie.datasource.hive_sync.partition_fields *property is left blank for the same reason. Regards, Gurudatt