jackye1995 opened a new pull request #1640: URL: https://github.com/apache/iceberg/pull/1640
As we are having multiple new Catalog implementations added to Iceberg, we need a way to load those Catalogs in Spark and Flink easily. Currently there is a simple switch branch that chooses between `hive` and `hadoop` catalogs. This approach requires the `iceberg-spark` and `iceberg-flink` module to take a dependency on the catalog implementation modules. This would potentially bring in many unnecessary dependencies as more and more cloud providers try to add support for Iceberg. This PR proposes the following way to load custom Catalog implementations: 1. the `type` of a custom catalog is always named `custom` 2. a `impl` property is used to determine the implementation class of the catalog 3. The catalog has to be initialized only using the Hadoop configuration object, or using a no-arg constructor. For example, a `GlueCatalog` will be used in Spark like the following: ``` spark.sql.catalog.glue = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.glue.type = custom spark.sql.catalog.glue.impl = org.apache.iceberg.aws.glue.GlueCatalog ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
