schlichtanders commented on issue #6808:
URL: https://github.com/apache/hudi/issues/6808#issuecomment-1270155907
I was now trying again some different configs
```
from pyspark.sql import SparkSession
from pathlib import Path
import os
os.environ["PYSPARK_SUBMIT_ARGS"] = " ".join([
# hudi config
"--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0",
"--conf spark.serializer=org.apache.spark.serializer.KryoSerializer",
"--conf
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog",
"--conf
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
# "--conf spark.sql.hive.convertMetastoreParquet=false", # taken from
AWS example
# others
# "--conf
spark.hadoop.hive.metastore.uris=jdbc:derby:;databaseName=metastore_db;create=true",
# "--conf spark.hadoop.hive.metastore.uris=''",
# f"--conf spark.sql.warehouse.dir={Path('.').absolute() /
'metastore_warehouse'}",
# "--conf spark.eventLog.enabled=false",
"--conf spark.sql.catalogImplementation=hive",
"--conf spark.hadoop.hive.metastore.schema.verification=false",
"--conf
spark.hadoop.hive.metastore.schema.verification.record.version=false",
"--conf
spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'",
"--conf spark.hadoop.datanucleus.schema.autoCreateTables=true",
# f"--conf spark.sql.warehouse.dir={Path('.').absolute() /
'metastore_warehouse'}",
# f"--conf spark.sql.hive.metastore.warehouse.dir={Path('.').absolute()
/ 'metastore_warehouse'}",
# necessary last string
"pyspark-shell",
])
os.environ["PYSPARK_SUBMIT_ARGS"]
spark = SparkSession.builder.getOrCreate()
dst_database = "default"
spark.sql(f"CREATE DATABASE IF NOT EXISTS {dst_database}")
tableName = "test_hudi_pyspark_local"
basePath = f"{Path('.').absolute()}/tmp/{tableName}"
hudi_options = {
"hoodie.table.name": tableName,
"hoodie.datasource.write.recordkey.field": "uuid",
"hoodie.datasource.write.partitionpath.field": "part",
"hoodie.datasource.write.table.name": tableName,
"hoodie.datasource.write.operation": "upsert",
"hoodie.datasource.write.precombine.field": "ts",
# "hoodie.upsert.shuffle.parallelism": 2,
# "hoodie.insert.shuffle.parallelism": 2,
"hoodie.datasource.hive_sync.database": "default",
"hoodie.datasource.hive_sync.table": tableName,
"hoodie.datasource.hive_sync.enable": "true",
# "hoodie.datasource.meta.sync.enable": "true",
# "hoodie.datasource.hive_sync.mode": "hiveql",
# "hoodie.datasource.hive_sync.mode": "hms",
# "hoodie.datasource.hive_sync.mode": "jdbc",
"hoodie.datasource.hive_sync.use_jdbc": "false",
# "hoodie.datasource.hive_sync.username": "APP",
# "hoodie.datasource.hive_sync.jdbcurl":
f"jdbc:derby:;databaseName={Path('.').absolute() / 'metastore_db'};create=true",
# "hoodie.datasource.hive_sync.jdbcurl":
"jdbc:derby:;databaseName=metastore_db;create=true",
"hoodie.datasource.hive_sync.partition_fields": "part",
"hoodie.datasource.hive_sync.partition_extractor_class":
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
"index.global.enabled": "true",
"hoodie.index.type": "GLOBAL_BLOOM",
}
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
```
and get a new more descriptive error
```
org.apache.hudi.exception.HoodieException: Could not sync using the meta
sync class org.apache.hudi.hive.HiveSyncTool
[...]
Caused by: java.lang.ClassNotFoundException:
org.apache.calcite.rel.type.RelDataTypeSystem
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]