Re: [I] [SUPPORT]Spark multi catalog integration hudi with hive metastore. [hudi]

via GitHub Sun, 31 Aug 2025 18:52:34 -0700


emcegom commented on issue #13805:
URL: https://github.com/apache/hudi/issues/13805#issuecomment-3240642536


   Hi @rangareddy 
   
   We are facing a similar issue and would like to use the catalog-based 
approach for Hudi table operations. Currently, we manage Hudi metadata through 
the HMS. However, in some of our production use cases we have a requirement for 
cross-cluster data queries in single spark session, which makes multi-catalog 
integration necessary.
   
   For example, Iceberg supports multiple Hive Metastore catalogs with a 
configuration like the following:
   `
   // Iceberg multi-catalog example
   String anotherHiveMetastoreURI = "thrift://another-ip:another-port";
   
   SparkConf sparkConf = new SparkConf()
           .set("spark.sql.catalog.spark_catalog", 
"org.apache.iceberg.spark.SparkSessionCatalog")
           .set("spark.sql.catalog.spark_catalog.type", "hive")
           .set("spark.sql.catalog.spark_catalog.default-namespace", 
defaultDatabase)
           .set("spark.sql.catalog.spark_catalog.uri", hiveMetastoreURI)
           .set("spark.sql.catalog.spark_catalog.warehouse", warehouse)
           .set("spark.sql.catalog.spark_catalog.hadoop.fs.s3a.access.key", 
"<access.key>")
           .set("spark.sql.catalog.spark_catalog.hadoop.fs.s3a.secret.key", 
"<secret.key>")
           .set("spark.sql.catalog.spark_catalog.hadoop.fs.s3a.endpoint", 
"http://minio-ip-address:port";)
           
.set("spark.sql.catalog.spark_catalog.hadoop.metastore.catalog.default", 
defaultCatalogName)
           .set("spark.default.parallelism", "1")
           .set(METASTOREURIS.varname, hiveMetastoreURI)
           .set("metastore.catalog.default", defaultCatalogName)
           .set("spark.sql.catalog." + anotherCatalogMappingName, 
"org.apache.iceberg.spark.SparkCatalog")
           .set("spark.sql.catalog." + anotherCatalogMappingName + ".type", 
"hive")
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".default-namespace", "default")
           .set("spark.sql.catalog." + anotherCatalogMappingName + ".uri", 
anotherHiveMetastoreURI)
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".warehouse", warehouse)
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".hadoop.fs.s3a.access.key", "<another.access.key>")
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".hadoop.fs.s3a.secret.key", "<another.secret.key>")
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".hadoop.fs.s3a.endpoint", "http://another-minio.ip-address:another.port";)
           .set("spark.sql.catalog." + anotherCatalogMappingName + 
".hadoop.metastore.catalog.default", "another_catalog");
   
   `
   
   Is there a similar way to achieve multi-catalog integration with Hudi in 
Spark 3.3.1 + Hudi 0.15?
   Or is there any recommended best practice for such cross-cluster scenarios？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT]Spark multi catalog integration hudi with hive metastore. [hudi]

Reply via email to