soumilshah1995 opened a new issue, #10231:
URL: https://github.com/apache/hudi/issues/10231
Hello everyone,
I'm encountering a small issue that seems to be related to settings, and I
would appreciate any guidance in identifying the problem. This pertains to my
upcoming videos where I'm covering the Hudi Hive Sync tool in detail.
I've started the Spark Thrift Server using the following command:
```
spark-submit \
--master 'local[*]' \
--conf spark.executor.extraJavaOptions=-Duser.timezone=Etc/UTC \
--conf spark.eventLog.enabled=false \
--class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 \
--name "Thrift JDBC/ODBC Server" \
--executor-memory 512m \
--packages org.apache.spark:spark-hive_2.12:3.4.0
```
Additionally, I have Beeline installed and connected to the default database:
```
beeline -u jdbc:hive2://localhost:10000/default
```
While my delta stream works fine, it appears that I'm facing issues using it
with the Hive MetaStore.
Here's my Spark submit command for the Hudi Delta Streamer:
```
spark-submit \
--class org.apache.hudi.utilities.streamer.HoodieStreamer \
--packages
'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2'
\
--repositories 'https://repo.maven.apache.org/maven2' \
--properties-file
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/spark-config.properties
\
--master 'local[*]' \
--executor-memory 1g \
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar
\
--table-type COPY_ON_WRITE \
--op UPSERT \
--enable-hive-sync \
--source-ordering-field ts \
--source-class org.apache.hudi.utilities.sources.CsvDFSSource \
--target-base-path file:///Users/soumilshah/Downloads/hudidb/ \
--target-table orders \
--props hudi_tbl.props
```
Hudi CONF
```
hoodie.datasource.write.recordkey.field=order_id
hoodie.datasource.write.partitionpath.field=order_date
hoodie.streamer.source.dfs.root=file:////Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/sampledata/orders
hoodie.datasource.write.precombine.field=ts
hoodie.deltastreamer.csv.header=true
hoodie.deltastreamer.csv.sep=\t
hoodie.datasource.hive_sync.enable=true
hoodie.datasource.hive_sync.mode=jdbc
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000
hoodie.datasource.hive_sync.database=default
hoodie.datasource.hive_sync.table=orders
hoodie.datasource.hive_sync.partition_fields=order_date
```
Spark Conf:
```
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.sql.hive.convertMetastoreParquet=false
```
The error I'm encountering is:
```
Required table missing : "VERSION" in Catalog "" Schema "". DataNucleus
requires this table to perform its persistence operations. Either your MetaData
is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table
missing : "VERSION" in Catalog "" Schema "". DataNucleus requires this table to
perform its persistence operations. Either your MetaData is incorrect, or you
need to enable "datanucleus.schema.autoCreateTables"
at
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
```
Any assistance in identifying what might be missing or misconfigured would
be highly appreciated.
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]