Copilot commented on code in PR #9908:
URL: https://github.com/apache/gravitino/pull/9908#discussion_r2782239711
##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables**
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table
summarizes supported
+operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Supported |
+| Load | ✅ Supported |
+| Alter | ✅ Supported |
+| Create | ✅ Supported |
+| Drop | ✅ Supported |
+| Purge | ❌ Not supported |
+
+*Alter supports metadata updates; `format` and `external` are immutable.
+
+:::note Feature Limitations
+- **Managed tables:** Not supported yet. A managed Iceberg table mode (with
commit support) will be
+ added in the next release.
+- **Purge:** Not supported for external Iceberg tables because Gravitino does
not manage the
+ underlying data.
+- **Indexes:** Not supported for Iceberg tables in the Generic Lakehouse
Catalog.
+:::
+
+### External vs Managed Tables
+
+- **External table (current release)**:
+ - Gravitino stores only metadata.
+ - `drop` removes Gravitino metadata only.
+ - `purge` is not supported.
+ - Use this mode to register existing Iceberg tables.
+
+- **Managed table (next release)**:
+ - Gravitino will manage Iceberg table metadata updates and commit lifecycle.
+ - `drop` and `purge` behavior will include data lifecycle management.
+
+### Table Properties
+
+Required and optional properties for Iceberg tables in a Generic Lakehouse
Catalog:
+
+| Property | Description | Default |
Required | Since Version |
+|------------|---------------------------------------------|---------|----------------|---------------|
+| `format` | Table format. Must be `iceberg`. | (none) | Yes
| 1.1.0 |
+| `external` | Must be `true` for external Iceberg tables. | false | Yes
(enforced) | 1.2.0 |
+
+**Immutability:** `format` and `external` are immutable after table creation.
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns. See
+[Table
Operations](./manage-relational-metadata-using-gravitino.md#table-operations).
+
+**Distribution defaulting:** If distribution is not specified (`none`),
Gravitino will default to
+`range` when sort orders are present, `hash` when partitions are present, and
`none` otherwise.
+
+The following sections provide examples for working with Iceberg tables.
+
+#### Creating an External Iceberg Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "iceberg_table",
+ "comment": "Example external Iceberg table",
+ "columns": [
+ {
+ "name": "id",
+ "type": "integer",
+ "comment": "Primary identifier",
+ "nullable": false
+ }
+ ],
+ "properties": {
+ "format": "iceberg",
+ "external": "true"
+ }
+}'
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_catalog/schemas/schema/tables
Review Comment:
The examples create an external Iceberg table without specifying a
`location`, but the generic lakehouse catalog docs describe `location` as the
key property for resolving table storage paths. Either document how `location`
is resolved/required for external Iceberg tables, or include `location` in the
example properties to avoid registering a table with no physical reference.
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericCatalogOperations.java:
##########
@@ -88,7 +88,7 @@ public GenericCatalogOperations() {
}
@VisibleForTesting
- GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {
+ public GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {
Review Comment:
Changing this constructor to `public` increases the production API surface
even though it’s annotated `@VisibleForTesting`. Since tests in the same
package can access a package-private constructor, consider keeping it
package-private to avoid encouraging non-test usage.
```suggestion
GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericCatalogOperations.java:
##########
@@ -231,23 +231,31 @@ public Table createTable(
SortOrder[] sortOrders,
Index[] indexes)
throws NoSuchSchemaException, TableAlreadyExistsException {
- Schema schema = loadSchema(NameIdentifier.of(ident.namespace().levels()));
- String tableLocation = calculateTableLocation(schema, ident, properties);
-
String format = properties.getOrDefault(Table.PROPERTY_TABLE_FORMAT, null);
Preconditions.checkArgument(
format != null, "Table format must be specified in table properties");
format = format.toLowerCase(Locale.ROOT);
- Map<String, String> newProperties = Maps.newHashMap(properties);
- newProperties.put(Table.PROPERTY_LOCATION, tableLocation);
- newProperties.put(Table.PROPERTY_TABLE_FORMAT, format);
-
// Get the table operations for the specified table format.
Supplier<ManagedTableOperations> tableOpsSupplier =
tableOpsCache.get(format);
Preconditions.checkArgument(tableOpsSupplier != null, "Unsupported table
format: %s", format);
ManagedTableOperations tableOps = tableOpsSupplier.get();
+ Map<String, String> newProperties = Maps.newHashMap(properties);
+ newProperties.put(Table.PROPERTY_TABLE_FORMAT, format);
+
+ Schema schema = loadSchema(NameIdentifier.of(ident.namespace().levels()));
+ String tableLocation =
+ (String)
+ propertiesMetadata
+ .tablePropertiesMetadata()
+ .getOrDefault(properties, Table.PROPERTY_LOCATION);
+ if (StringUtils.isNotBlank(tableLocation)) {
+ newProperties.put(Table.PROPERTY_LOCATION,
ensureTrailingSlash(tableLocation));
+ } else if (tableOps.supportsGenerateTableLocation(newProperties)) {
+ newProperties.put(Table.PROPERTY_LOCATION,
calculateTableLocation(schema, ident, properties));
Review Comment:
With the new `supportsGenerateTableLocation` gating, a format can now create
a table with no `location` even when neither table/schema/catalog location is
set (previously `calculateTableLocation` would throw). If `location` is still
required for all lakehouse tables, add a validation/exception when
`tableLocation` is blank and `supportsGenerateTableLocation(...)` is false.
```suggestion
newProperties.put(Table.PROPERTY_LOCATION,
calculateTableLocation(schema, ident, properties));
} else {
Preconditions.checkArgument(
false,
"Table location must be specified in properties for format '%s'
because the format does not support generating a table location",
format);
```
##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables**
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table
summarizes supported
+operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Supported |
+| Load | ✅ Supported |
+| Alter | ✅ Supported |
+| Create | ✅ Supported |
+| Drop | ✅ Supported |
+| Purge | ❌ Not supported |
+
Review Comment:
The Markdown tables here use a double leading pipe (`|| ...`) which renders
as an extra empty column in most Markdown parsers (including Docusaurus). Use a
standard table row with a single leading `|` for the header and separator rows.
##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables**
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table
summarizes supported
+operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Supported |
+| Load | ✅ Supported |
+| Alter | ✅ Supported |
+| Create | ✅ Supported |
+| Drop | ✅ Supported |
+| Purge | ❌ Not supported |
+
+*Alter supports metadata updates; `format` and `external` are immutable.
Review Comment:
This line starts with `*Alter` (no space) so it will be parsed as italic
text and may not render as intended. If this is meant to be a bullet, add a
space after `*`; if it’s meant to be emphasis, close the `*...*`.
```suggestion
* Alter supports metadata updates; `format` and `external` are immutable.
```
##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/iceberg/integration/test/CatalogGenericCatalogIcebergIT.java:
##########
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.catalog.lakehouse.iceberg.integration.test;
+
+import com.google.common.collect.Maps;
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.commons.io.FileUtils;
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.Schema;
+import org.apache.gravitino.client.GravitinoMetalake;
+import org.apache.gravitino.integration.test.util.BaseIT;
+import org.apache.gravitino.integration.test.util.GravitinoITUtils;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.Table;
+import org.apache.gravitino.rel.TableChange;
+import org.apache.gravitino.rel.expressions.transforms.Transforms;
+import org.apache.gravitino.rel.types.Types;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class CatalogGenericCatalogIcebergIT extends BaseIT {
+ private static final Logger LOG =
LoggerFactory.getLogger(CatalogGenericCatalogIcebergIT.class);
+
+ public static final String METALAKE_NAME =
+ GravitinoITUtils.genRandomName("CatalogGenericLakeIcebergIT_metalake");
+
+ private final String catalogName =
+ GravitinoITUtils.genRandomName("CatalogGenericLakeIcebergIT_catalog");
+ private final String schemaPrefix = "CatalogGenericLakeIceberg_schema";
+ private final String tablePrefix = "CatalogGenericLakeIceberg_table";
+ private final String schemaName =
GravitinoITUtils.genRandomName(schemaPrefix);
+ private final String tableName = GravitinoITUtils.genRandomName(tablePrefix);
+ private static final String TABLE_COMMENT = "table_comment";
+
+ private final String provider = "lakehouse-generic";
+
+ private GravitinoMetalake metalake;
+ private Catalog catalog;
+ private String tempDirectory;
+
+ @BeforeAll
+ public void startup() throws Exception {
+ createMetalake();
+ createCatalog();
+ createSchema();
+
+ Path tempDir = Files.createTempDirectory("icebergGenericCatalogIT");
+ tempDirectory = tempDir.toString();
+ }
+
+ @AfterAll
+ public void stop() throws IOException {
+ if (client != null) {
+ Arrays.stream(catalog.asSchemas().listSchemas())
+ .filter(schema -> !schema.equals("default"))
+ .forEach(schema -> catalog.asSchemas().dropSchema(schema, true));
+ Arrays.stream(metalake.listCatalogs()).forEach(name ->
metalake.dropCatalog(name, true));
+ client.dropMetalake(METALAKE_NAME, true);
+ }
+ try {
+ closer.close();
+ } catch (Exception e) {
+ LOG.error("Failed to close CloseableGroup", e);
+ }
+
+ client = null;
+ FileUtils.deleteDirectory(new File(tempDirectory));
+ }
Review Comment:
This integration test creates and deletes a temporary directory
(`tempDirectory`), but the directory is never used by the test logic. Consider
removing the temp dir setup/cleanup (or use it for a table `location`) to
reduce noise and avoid potential cleanup failures if initialization is
interrupted.
##########
core/src/main/java/org/apache/gravitino/catalog/ManagedTableOperations.java:
##########
@@ -537,4 +545,25 @@ int calculateColumnPosition(
throw new IllegalArgumentException("Unsupported column position: " +
position);
}
}
+
+ private void validateIndexSupport(Index[] indexes) {
+ if (!supportsIndex() && indexes != null && indexes.length > 0) {
+ throw new IllegalArgumentException("Indexes are not supported by this
catalog");
+ }
+ }
+
+ private void validateIndexSupport(TableChange[] changes) {
+ if (supportsIndex()) {
+ return;
+ }
+ boolean hasIndexChange =
+ Arrays.stream(changes)
+ .anyMatch(
+ change ->
+ change instanceof TableChange.AddIndex
+ || change instanceof TableChange.DeleteIndex);
+ if (hasIndexChange) {
+ throw new IllegalArgumentException("Indexes are not supported by this
catalog");
+ }
Review Comment:
Index support is now enforced in `createTable`/`alterTable` via
`supportsIndex()`, but there are no unit tests covering the new failure paths
(e.g., create/alter with index changes when `supportsIndex()` is false). Add
tests to prevent regressions in index validation behavior.
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/iceberg/IcebergTableOperations.java:
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.catalog.lakehouse.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.gravitino.EntityStore;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.catalog.ManagedSchemaOperations;
+import org.apache.gravitino.catalog.ManagedTableOperations;
+import org.apache.gravitino.connector.SupportsSchemas;
+import org.apache.gravitino.exceptions.NoSuchSchemaException;
+import org.apache.gravitino.exceptions.TableAlreadyExistsException;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.Table;
+import org.apache.gravitino.rel.expressions.distributions.Distribution;
+import org.apache.gravitino.rel.expressions.distributions.Distributions;
+import org.apache.gravitino.rel.expressions.sorts.SortOrder;
+import org.apache.gravitino.rel.expressions.transforms.Transform;
+import org.apache.gravitino.rel.indexes.Index;
+import org.apache.gravitino.storage.IdGenerator;
+
+public class IcebergTableOperations extends ManagedTableOperations {
+
+ private final EntityStore store;
+
+ private final ManagedSchemaOperations schemaOps;
+
+ private final IdGenerator idGenerator;
+
+ public IcebergTableOperations(
+ EntityStore store, ManagedSchemaOperations schemaOps, IdGenerator
idGenerator) {
+ this.store = store;
+ this.schemaOps = schemaOps;
+ this.idGenerator = idGenerator;
+ }
+
+ @Override
+ protected EntityStore store() {
+ return store;
+ }
+
+ @Override
+ protected SupportsSchemas schemas() {
+ return schemaOps;
+ }
+
+ @Override
+ protected IdGenerator idGenerator() {
+ return idGenerator;
+ }
+
+ @Override
+ protected boolean supportsIndex() {
+ return false;
+ }
+
+ @Override
+ public boolean supportsGenerateTableLocation(Map<String, String> properties)
{
+ boolean external =
+ Optional.ofNullable(properties.get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
+ return !external;
+ }
+
+ @Override
+ public Table createTable(
+ NameIdentifier ident,
+ Column[] columns,
+ String comment,
+ Map<String, String> properties,
+ Transform[] partitions,
+ Distribution distribution,
+ SortOrder[] sortOrders,
+ Index[] indexes)
+ throws NoSuchSchemaException, TableAlreadyExistsException {
+ boolean external =
+ Optional.ofNullable(properties.get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
+ Preconditions.checkArgument(external, "Iceberg table in generic catalog
must be external");
+
+ if (Distributions.NONE.equals(distribution) || distribution == null) {
+ int sortOrderCount = sortOrders == null ? 0 : sortOrders.length;
+ int partitionCount = partitions == null ? 0 : partitions.length;
+ distribution = getIcebergDefaultDistribution(sortOrderCount > 0,
partitionCount > 0);
+ }
Review Comment:
`createTable` enforces `external=true`, and `supportsGenerateTableLocation`
returns `false` for external tables, so external Iceberg tables can be created
without any `location` being set by the catalog layer. If external registration
needs a physical reference, validate that `Table.PROPERTY_LOCATION` is
present/non-blank (or clearly document that Iceberg tables in this catalog are
location-less metadata only).
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericTablePropertiesMetadata.java:
##########
@@ -48,7 +49,15 @@ public class GenericTablePropertiesMetadata extends
BasePropertiesMetadata {
Table.PROPERTY_TABLE_FORMAT,
"The format of the table",
true /* immutable */,
- false /* hidden */));
+ false /* hidden */),
+ booleanPropertyEntry(
+ Table.PROPERTY_EXTERNAL,
+ "Indicate whether this is an external Iceberg table.",
+ false /* required */,
+ true /* immutable */,
+ false /* defaultValue */,
+ false /* hidden */,
+ false /* reserved */));
Review Comment:
`Table.PROPERTY_EXTERNAL` is added as a generic table property, but the
description says it indicates an "external Iceberg table". Since this metadata
applies to all generic tables/formats, the description should be
format-agnostic (or moved into format-specific property entries).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]