[I] [SUPPORT] Issue with Hudi Hive Sync Tool with Hive MetaStore [hudi]

via GitHub Sun, 03 Dec 2023 15:53:03 -0800


soumilshah1995 opened a new issue, #10231:
URL: https://github.com/apache/hudi/issues/10231


   Hello everyone,
   I'm encountering a small issue that seems to be related to settings, and I 
would appreciate any guidance in identifying the problem. This pertains to my 
upcoming videos where I'm covering the Hudi Hive Sync tool in detail.
   I've started the Spark Thrift Server using the following command:
   
   ```
   
   spark-submit \
     --master 'local[*]' \
     --conf spark.executor.extraJavaOptions=-Duser.timezone=Etc/UTC \
     --conf spark.eventLog.enabled=false \
     --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 \
     --name "Thrift JDBC/ODBC Server" \
     --executor-memory 512m \
     --packages org.apache.spark:spark-hive_2.12:3.4.0
   ```
   
   Additionally, I have Beeline installed and connected to the default database:
   
   ```
   beeline -u jdbc:hive2://localhost:10000/default
   ```
   
   While my delta stream works fine, it appears that I'm facing issues using it 
with the Hive MetaStore.
   Here's my Spark submit command for the Hudi Delta Streamer:
   ```
   spark-submit \
       --class org.apache.hudi.utilities.streamer.HoodieStreamer \
       --packages 
'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2'
 \
       --repositories 'https://repo.maven.apache.org/maven2' \
       --properties-file 
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/spark-config.properties
 \
       --master 'local[*]' \
       --executor-memory 1g \
        
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar
  \
       --table-type COPY_ON_WRITE \
       --op UPSERT \
       --enable-hive-sync \
       --source-ordering-field ts \
       --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
       --target-base-path file:///Users/soumilshah/Downloads/hudidb/ \
       --target-table orders \
       --props hudi_tbl.props
   ```
   
   Hudi CONF
   
   ```
   hoodie.datasource.write.recordkey.field=order_id
   hoodie.datasource.write.partitionpath.field=order_date
   
hoodie.streamer.source.dfs.root=file:////Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/sampledata/orders
   hoodie.datasource.write.precombine.field=ts
   
   hoodie.deltastreamer.csv.header=true
   hoodie.deltastreamer.csv.sep=\t
   
   hoodie.datasource.hive_sync.enable=true
   hoodie.datasource.hive_sync.mode=jdbc
   hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000
   hoodie.datasource.hive_sync.database=default
   hoodie.datasource.hive_sync.table=orders
   hoodie.datasource.hive_sync.partition_fields=order_date
   
   ```
   
   Spark Conf:
   ```
   spark.serializer=org.apache.spark.serializer.KryoSerializer 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog 
spark.sql.hive.convertMetastoreParquet=false
   ```
   
   The error I'm encountering is:
   ```
   Required table missing : "VERSION" in Catalog "" Schema "". DataNucleus 
requires this table to perform its persistence operations. Either your MetaData 
is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
   org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table 
missing : "VERSION" in Catalog "" Schema "". DataNucleus requires this table to 
perform its persistence operations. Either your MetaData is incorrect, or you 
need to enable "datanucleus.schema.autoCreateTables"
       at 
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
       at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
   ```
   
   Any assistance in identifying what might be missing or misconfigured would 
be highly appreciated.
   Thank you!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] Issue with Hudi Hive Sync Tool with Hive MetaStore [hudi]

Reply via email to