This is an automated email from the ASF dual-hosted git repository.
szehon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 5117b6bfa9 Docs: Update new catalog features (#7433)
5117b6bfa9 is described below
commit 5117b6bfa990f7edcc122212180de7317d4ccb78
Author: Hongyue/Steve Zhang <[email protected]>
AuthorDate: Tue May 2 17:39:19 2023 -0700
Docs: Update new catalog features (#7433)
---
docs/spark-configuration.md | 18 +++++++++++++++---
.../java/org/apache/iceberg/spark/SparkCatalog.java | 13 ++++++++++---
.../java/org/apache/iceberg/spark/SparkCatalog.java | 13 ++++++++++---
3 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/docs/spark-configuration.md b/docs/spark-configuration.md
index 926ec0207d..4dbe527aee 100644
--- a/docs/spark-configuration.md
+++ b/docs/spark-configuration.md
@@ -40,6 +40,14 @@ spark.sql.catalog.hive_prod.uri =
thrift://metastore-host:port
# omit uri to use the same URI as Spark: hive.metastore.uris in hive-site.xml
```
+Below is an example for a REST catalog named `rest_prod` that loads tables
from REST URL `http://localhost:8080`:
+
+```plain
+spark.sql.catalog.rest_prod = org.apache.iceberg.spark.SparkCatalog
+spark.sql.catalog.rest_prod.type = rest
+spark.sql.catalog.rest_prod.uri = http://localhost:8080
+```
+
Iceberg also supports a directory-based catalog in HDFS that can be configured
using `type=hadoop`:
```plain
@@ -66,12 +74,16 @@ Both catalogs are configured using properties nested under
the catalog name. Com
| Property | Values
| Description |
| -------------------------------------------------- |
----------------------------- |
-------------------------------------------------------------------- |
| spark.sql.catalog._catalog-name_.type | `hive`, `hadoop` or
`rest` | The underlying Iceberg catalog implementation, `HiveCatalog`,
`HadoopCatalog`, `RESTCatalog` or left unset if using a custom catalog |
-| spark.sql.catalog._catalog-name_.catalog-impl |
| The underlying Iceberg catalog implementation.|
+| spark.sql.catalog._catalog-name_.catalog-impl |
| The custom Iceberg catalog implementation. If `type` is null,
`catalog-impl` must not be null. |
+| spark.sql.catalog._catalog-name_.io-impl |
| The custom FileIO implementation. |
+| spark.sql.catalog._catalog-name_.metrics-reporter-impl |
| The custom MetricsReporter implementation. |
| spark.sql.catalog._catalog-name_.default-namespace | default
| The default current namespace for the catalog |
-| spark.sql.catalog._catalog-name_.uri | thrift://host:port
| Metastore connect URI; default from `hive-site.xml` |
+| spark.sql.catalog._catalog-name_.uri | thrift://host:port
| Hive metastore URL for hive typed catalog, REST URL for REST typed
catalog |
| spark.sql.catalog._catalog-name_.warehouse |
hdfs://nn:8020/warehouse/path | Base path for the warehouse directory |
| spark.sql.catalog._catalog-name_.cache-enabled | `true` or `false`
| Whether to enable catalog cache, default value is `true` |
-| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30
seconds) | Duration after which cached catalog entries are expired; Only
effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0`
disables caching entirely, irrespective of `cache-enabled`. Default is `30000`
(30 seconds) | |
+| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30
seconds) | Duration after which cached catalog entries are expired; Only
effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0`
disables caching entirely, irrespective of `cache-enabled`. Default is `30000`
(30 seconds) |
+| spark.sql.catalog._catalog-name_.table-default._propertyKey_ |
| Default Iceberg table property value for property key
_propertyKey_, which will be set on tables created by this catalog if not
overridden
|
+| spark.sql.catalog._catalog-name_.table-override._propertyKey_ |
| Enforced Iceberg table property value for property key
_propertyKey_, which cannot be overridden by user
|
Additional properties can be found in common [catalog
configuration](../configuration#catalog-properties).
diff --git
a/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
b/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
index 3ad3f5d0ee..cae62486ca 100644
--- a/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
+++ b/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
@@ -89,16 +89,23 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap;
* <p>This supports the following catalog configuration options:
*
* <ul>
- * <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a
non-hive or hadoop
- * catalog, use the <code>catalog-impl</code> option.
- * <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
+ * <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To
specify a non-hive or
+ * hadoop catalog, use the <code>catalog-impl</code> option.
+ * <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST
URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation
to use
+ * <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO}
implementation to use
+ * <li><code>metrics-reporter-impl</code> - a custom {@link
+ * org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before
expiring tables from
* catalog cache. Refer to {@link
CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
+ * <li><code>table-default.$tablePropertyKey</code> - table property
$tablePropertyKey default at
+ * catalog level
+ * <li><code>table-override.$tablePropertyKey</code> - table property
$tablePropertyKey enforced
+ * at catalog level
* </ul>
*
* <p>
diff --git
a/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
b/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
index 3ad3f5d0ee..cae62486ca 100644
--- a/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
+++ b/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
@@ -89,16 +89,23 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap;
* <p>This supports the following catalog configuration options:
*
* <ul>
- * <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a
non-hive or hadoop
- * catalog, use the <code>catalog-impl</code> option.
- * <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
+ * <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To
specify a non-hive or
+ * hadoop catalog, use the <code>catalog-impl</code> option.
+ * <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST
URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation
to use
+ * <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO}
implementation to use
+ * <li><code>metrics-reporter-impl</code> - a custom {@link
+ * org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before
expiring tables from
* catalog cache. Refer to {@link
CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
+ * <li><code>table-default.$tablePropertyKey</code> - table property
$tablePropertyKey default at
+ * catalog level
+ * <li><code>table-override.$tablePropertyKey</code> - table property
$tablePropertyKey enforced
+ * at catalog level
* </ul>
*
* <p>