soumilshah1995 commented on issue #10231:
URL: https://github.com/apache/hudi/issues/10231#issuecomment-1845922336
Subject: Need Assistance with Apache Hudi Hive Sync Configuration Issue
I'm currently encountering an issue with Hive sync while using Apache Hudi
with Apache Derby as the backend. I have Apache Derby running locally, and I'm
seeking assistance in identifying any missing configurations in my setup.
Here are the relevant configuration files:
Spark Configuration (spark-config.properties):
```
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.sql.hive.convertMetastoreParquet=false
#hive.metastore.uris=jdbc:derby://localhost:1527/MyDatabase;create=true
```
Hudi Configuration:
```
hoodie.datasource.write.recordkey.field=order_id
hoodie.datasource.write.partitionpath.field=order_date
hoodie.streamer.source.dfs.root=file:////Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/sampledata/orders
hoodie.datasource.write.precombine.field=ts
hoodie.deltastreamer.csv.header=true
hoodie.deltastreamer.csv.sep=\t
hoodie.datasource.hive_sync.enable=true
hoodie.datasource.hive_sync.use_jdbc=false
hoodie.datasource.hive_sync.mode=hms
hoodie.datasource.hive_sync.jdbcurl=jdbc:derby://localhost:1527/MyDatabase;create=true
hoodie.datasource.hive_sync.database=MyDatabase
hoodie.datasource.hive_sync.hive.metastore.auto.create.all=true
hoodie.datasource.hive_sync.schema=MyDatabase
hoodie.datasource.hive_sync.table=orders
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.datasource.hive_sync.partition_fields=order_date
hoodie.datasource.write.hive_style_partitioning=true
```
Spark Submit Command:
```
spark-submit \
--class org.apache.hudi.utilities.streamer.HoodieStreamer \
--packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
--properties-file
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/spark-config.properties
\
--master 'local[*]' \
--executor-memory 1g \
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar
\
--table-type COPY_ON_WRITE \
--op UPSERT \
--enable-hive-sync \
--source-ordering-field ts \
--source-class org.apache.hudi.utilities.sources.CsvDFSSource \
--target-base-path file:///Users/soumilshah/Downloads/hudidb/ \
--target-table orders \
--props hudi_tbl.props
```
Error Message
```
/12/07 13:40:37 WARN Query: Query for candidates of
org.apache.hadoop.hive.metastore.model.MTableColumnStatistics and subclasses
resulted in no possible candidates
Required table missing : "CDS" in Catalog "" Schema "". DataNucleus requires
this table to perform its persistence operations. Either your MetaData is
incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table
missing : "CDS" in Catalog "" Schema "". DataNucleus requires this table to
perform its persistence operations. Either your MetaData is incorrect, or you
need to enable "datanucleus.schema.autoCreateTables"
at
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
```
Apache DERBY works fine


Even after trying both Hive sync modes (HMS and JDBC), the issue persists.
The process works fine without Hive sync. Could someone please assist me in
identifying the problem or any missing configurations?
Thanks in advance for your help!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]