Hi All,

I am using HoodieDeltaStreamer (hoodie-0.4.7) to migrate a small table. The
data is being written successfully in parquet format but the hive sync
fails.

Here's the Stacktrace.

19/10/14 17:02:12 INFO metastore.ObjectStore: Setting MetaStore object
pin classes with
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
19/10/14 17:02:12 ERROR yarn.ApplicationMaster: User class threw
exception: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore
cannot be cast to
com.uber.hoodie.org.apache.hadoop_hive.metastore.PartitionExpressionProxy
java.lang.ClassCastException:
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore
cannot be cast to
com.uber.hoodie.org.apache.hadoop_hive.metastore.PartitionExpressionProxy
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.createExpressionProxy(ObjectStore.java:367)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.initialize(ObjectStore.java:345)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.ObjectStore.setConf(ObjectStore.java:298)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:60)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:682)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:660)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:709)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:508)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6481)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:207)
        at 
com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
        at 
com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102)
        at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61)
        at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.syncHive(HoodieDeltaStreamer.java:328)
        at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:298)
        at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:469)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)


Here's are the properties that I am using,


```

hoodie.upsert.shuffle.parallelism=2
hoodie.insert.shuffle.parallelism=2
hoodie.bulkinsert.shuffle.parallelism=2

# Key fields, for kafka example
hoodie.datasource.write.recordkey.field=<primary_key>
*hoodie.datasource.write.partitionpath.field=*
# schema provider configs
hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/schema_name/versions/latest
# Kafka Source
hoodie.datasource.hive_sync.database=default
hoodie.datasource.hive_sync.table=table_name
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000
*hoodie.datasource.hive_sync.partition_fields=*

hoodie.deltastreamer.source.kafka.topic=topic_name
#Kafka props
metadata.broker.list=localhost:9092
auto.offset.reset=smallest
schema.registry.url=http://localhost:8081

```


The table does not have partitons, hence i have kept
*hoodie.datasource.write.partitionpath.field *blank,

so it is writing it to `default` directory.

Also, *hoodie.datasource.hive_sync.partition_fields *property is left
blank for the same reason.


Regards,

Gurudatt

Reply via email to