[GitHub] [hudi] schlichtanders commented on issue #6808: [SUPPORT] Cannot sync to spark embedded derby hive meta store (the default one)

GitBox Thu, 06 Oct 2022 07:27:49 -0700


schlichtanders commented on issue #6808:
URL: https://github.com/apache/hudi/issues/6808#issuecomment-1270155907


   I was now trying again some different configs
   
   ```
   from pyspark.sql import SparkSession
   from pathlib import Path
   import os
   
   os.environ["PYSPARK_SUBMIT_ARGS"] = " ".join([
       # hudi config
       "--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0",
       "--conf spark.serializer=org.apache.spark.serializer.KryoSerializer",
       "--conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog",
       "--conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
       # "--conf spark.sql.hive.convertMetastoreParquet=false", # taken from 
AWS example
       # others
       # "--conf 
spark.hadoop.hive.metastore.uris=jdbc:derby:;databaseName=metastore_db;create=true",
       # "--conf spark.hadoop.hive.metastore.uris=''",
       # f"--conf spark.sql.warehouse.dir={Path('.').absolute() / 
'metastore_warehouse'}",
       # "--conf spark.eventLog.enabled=false",
       "--conf spark.sql.catalogImplementation=hive",
       "--conf spark.hadoop.hive.metastore.schema.verification=false",
       "--conf 
spark.hadoop.hive.metastore.schema.verification.record.version=false",
       "--conf 
spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'",
       "--conf spark.hadoop.datanucleus.schema.autoCreateTables=true",
       # f"--conf spark.sql.warehouse.dir={Path('.').absolute() / 
'metastore_warehouse'}",
       # f"--conf spark.sql.hive.metastore.warehouse.dir={Path('.').absolute() 
/ 'metastore_warehouse'}",
       # necessary last string
       "pyspark-shell",
   ])
   os.environ["PYSPARK_SUBMIT_ARGS"]
   
   spark = SparkSession.builder.getOrCreate()
   dst_database = "default"
   spark.sql(f"CREATE DATABASE IF NOT EXISTS {dst_database}")
   
   tableName = "test_hudi_pyspark_local"
   basePath = f"{Path('.').absolute()}/tmp/{tableName}"
   
   hudi_options = {
       "hoodie.table.name": tableName,
       "hoodie.datasource.write.recordkey.field": "uuid",
       "hoodie.datasource.write.partitionpath.field": "part",
       "hoodie.datasource.write.table.name": tableName,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.datasource.write.precombine.field": "ts",
       # "hoodie.upsert.shuffle.parallelism": 2,
       # "hoodie.insert.shuffle.parallelism": 2,
       "hoodie.datasource.hive_sync.database": "default",
       "hoodie.datasource.hive_sync.table": tableName,
       "hoodie.datasource.hive_sync.enable": "true",
       # "hoodie.datasource.meta.sync.enable": "true",
       # "hoodie.datasource.hive_sync.mode": "hiveql",
       # "hoodie.datasource.hive_sync.mode": "hms",
       # "hoodie.datasource.hive_sync.mode": "jdbc",
       "hoodie.datasource.hive_sync.use_jdbc": "false",
       # "hoodie.datasource.hive_sync.username": "APP",
       # "hoodie.datasource.hive_sync.jdbcurl": 
f"jdbc:derby:;databaseName={Path('.').absolute() / 'metastore_db'};create=true",
       # "hoodie.datasource.hive_sync.jdbcurl": 
"jdbc:derby:;databaseName=metastore_db;create=true",
       "hoodie.datasource.hive_sync.partition_fields": "part",
       "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
       "index.global.enabled": "true",
       "hoodie.index.type": "GLOBAL_BLOOM",
   }
   
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
   ```
   
   and get a new more descriptive error
   
   ```
   org.apache.hudi.exception.HoodieException: Could not sync using the meta 
sync class org.apache.hudi.hive.HiveSyncTool
   [...]
   Caused by: java.lang.ClassNotFoundException: 
org.apache.calcite.rel.type.RelDataTypeSystem
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] schlichtanders commented on issue #6808: [SUPPORT] Cannot sync to spark embedded derby hive meta store (the default one)

Reply via email to