Re: [I] [SUPPORT] Issue with Hudi Hive Sync Tool with Hive MetaStore [hudi]

via GitHub Thu, 07 Dec 2023 10:50:20 -0800


soumilshah1995 commented on issue #10231:
URL: https://github.com/apache/hudi/issues/10231#issuecomment-1845922336


   Subject: Need Assistance with Apache Hudi Hive Sync Configuration Issue
   
   
   I'm currently encountering an issue with Hive sync while using Apache Hudi 
with Apache Derby as the backend. I have Apache Derby running locally, and I'm 
seeking assistance in identifying any missing configurations in my setup.
   
   Here are the relevant configuration files:
   Spark Configuration (spark-config.properties):
   ```
   spark.serializer=org.apache.spark.serializer.KryoSerializer
   
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
   spark.sql.hive.convertMetastoreParquet=false
   #hive.metastore.uris=jdbc:derby://localhost:1527/MyDatabase;create=true
   ```
   
   Hudi Configuration:
   ```
   hoodie.datasource.write.recordkey.field=order_id
   hoodie.datasource.write.partitionpath.field=order_date
   
hoodie.streamer.source.dfs.root=file:////Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/sampledata/orders
   hoodie.datasource.write.precombine.field=ts
   
   hoodie.deltastreamer.csv.header=true
   hoodie.deltastreamer.csv.sep=\t
   
   hoodie.datasource.hive_sync.enable=true
   hoodie.datasource.hive_sync.use_jdbc=false
   hoodie.datasource.hive_sync.mode=hms
   
hoodie.datasource.hive_sync.jdbcurl=jdbc:derby://localhost:1527/MyDatabase;create=true
   hoodie.datasource.hive_sync.database=MyDatabase
   hoodie.datasource.hive_sync.hive.metastore.auto.create.all=true
   hoodie.datasource.hive_sync.schema=MyDatabase
   hoodie.datasource.hive_sync.table=orders
   
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.datasource.hive_sync.partition_fields=order_date
   hoodie.datasource.write.hive_style_partitioning=true
   ```
   
   
   Spark Submit Command:
   ```
   spark-submit \
       --class org.apache.hudi.utilities.streamer.HoodieStreamer \
       --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
       --properties-file 
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/spark-config.properties
 \
       --master 'local[*]' \
       --executor-memory 1g \
        
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar
  \
       --table-type COPY_ON_WRITE \
       --op UPSERT \
       --enable-hive-sync \
       --source-ordering-field ts \
       --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
       --target-base-path file:///Users/soumilshah/Downloads/hudidb/ \
       --target-table orders \
       --props hudi_tbl.props
   ```
   
   Error Message
   ```
   /12/07 13:40:37 WARN Query: Query for candidates of 
org.apache.hadoop.hive.metastore.model.MTableColumnStatistics and subclasses 
resulted in no possible candidates
   Required table missing : "CDS" in Catalog "" Schema "". DataNucleus requires 
this table to perform its persistence operations. Either your MetaData is 
incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
   org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table 
missing : "CDS" in Catalog "" Schema "". DataNucleus requires this table to 
perform its persistence operations. Either your MetaData is incorrect, or you 
need to enable "datanucleus.schema.autoCreateTables"
        at 
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
   ```
   
   
   Apache DERBY works fine 
   
   ![Screenshot 2023-12-07 at 1 48 04 
PM](https://github.com/apache/hudi/assets/39345855/dd1ba893-6ada-43b7-9e59-60d839d285e3)
   
   
   ![Screenshot 2023-12-07 at 1 48 29 
PM](https://github.com/apache/hudi/assets/39345855/78024659-a3f6-42ad-b568-da9530f46b24)
   
   
   
   Even after trying both Hive sync modes (HMS and JDBC), the issue persists. 
The process works fine without Hive sync. Could someone please assist me in 
identifying the problem or any missing configurations?
   
   
   Thanks in advance for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] Issue with Hudi Hive Sync Tool with Hive MetaStore [hudi]

Reply via email to