[GitHub] [iceberg] jackye1995 opened a new pull request #1640: Allow loading custom Catalog implementation in Spark and Flink

GitBox Wed, 21 Oct 2020 17:23:58 -0700


jackye1995 opened a new pull request #1640:
URL: https://github.com/apache/iceberg/pull/1640



   As we are having multiple new Catalog implementations added to Iceberg, we 
need a way to load those Catalogs in Spark and Flink easily. Currently there is 
a simple switch branch that chooses between `hive` and `hadoop` catalogs. This 
approach requires the `iceberg-spark` and `iceberg-flink` module to take a 
dependency on the catalog implementation modules. This would potentially bring 
in many unnecessary dependencies as more and more cloud providers try to add 
support for Iceberg.
   
   This PR proposes the following way to load custom Catalog implementations:
   1. the `type` of a custom catalog is always named `custom`
   2. a `impl` property is used to determine the implementation class of the 
catalog
   3. The catalog has to be initialized only using the Hadoop configuration 
object, or using a no-arg constructor.
   
   For example, a `GlueCatalog` will be used in Spark like the following:
   
   ```
   spark.sql.catalog.glue = org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.glue.type = custom
   spark.sql.catalog.glue.impl = org.apache.iceberg.aws.glue.GlueCatalog
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 opened a new pull request #1640: Allow loading custom Catalog implementation in Spark and Flink

Reply via email to