Copilot commented on code in PR #9908:
URL: https://github.com/apache/gravitino/pull/9908#discussion_r2782239711


##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables 
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables** 
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table 
summarizes supported
+operations:
+
+| Operation | Support Status  |
+|-----------|-----------------|
+| List      | ✅ Supported     |
+| Load      | ✅ Supported     |
+| Alter     | ✅ Supported     |
+| Create    | ✅ Supported     |
+| Drop      | ✅ Supported     |
+| Purge     | ❌ Not supported |
+
+*Alter supports metadata updates; `format` and `external` are immutable.
+
+:::note Feature Limitations
+- **Managed tables:** Not supported yet. A managed Iceberg table mode (with 
commit support) will be
+  added in the next release.
+- **Purge:** Not supported for external Iceberg tables because Gravitino does 
not manage the
+  underlying data.
+- **Indexes:** Not supported for Iceberg tables in the Generic Lakehouse 
Catalog.
+:::
+
+### External vs Managed Tables
+
+- **External table (current release)**:
+  - Gravitino stores only metadata.
+  - `drop` removes Gravitino metadata only.
+  - `purge` is not supported.
+  - Use this mode to register existing Iceberg tables.
+
+- **Managed table (next release)**:
+  - Gravitino will manage Iceberg table metadata updates and commit lifecycle.
+  - `drop` and `purge` behavior will include data lifecycle management.
+
+### Table Properties
+
+Required and optional properties for Iceberg tables in a Generic Lakehouse 
Catalog:
+
+| Property   | Description                                 | Default | 
Required       | Since Version |
+|------------|---------------------------------------------|---------|----------------|---------------|
+| `format`   | Table format. Must be `iceberg`.            | (none)  | Yes     
       | 1.1.0         |
+| `external` | Must be `true` for external Iceberg tables. | false   | Yes 
(enforced) | 1.2.0         |
+
+**Immutability:** `format` and `external` are immutable after table creation.
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns. See
+[Table 
Operations](./manage-relational-metadata-using-gravitino.md#table-operations).
+
+**Distribution defaulting:** If distribution is not specified (`none`), 
Gravitino will default to
+`range` when sort orders are present, `hash` when partitions are present, and 
`none` otherwise.
+
+The following sections provide examples for working with Iceberg tables.
+
+#### Creating an External Iceberg Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" -d '{
+  "name": "iceberg_table",
+  "comment": "Example external Iceberg table",
+  "columns": [
+    {
+      "name": "id",
+      "type": "integer",
+      "comment": "Primary identifier",
+      "nullable": false
+    }
+  ],
+  "properties": {
+    "format": "iceberg",
+    "external": "true"
+  }
+}' 
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_catalog/schemas/schema/tables

Review Comment:
   The examples create an external Iceberg table without specifying a 
`location`, but the generic lakehouse catalog docs describe `location` as the 
key property for resolving table storage paths. Either document how `location` 
is resolved/required for external Iceberg tables, or include `location` in the 
example properties to avoid registering a table with no physical reference.



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericCatalogOperations.java:
##########
@@ -88,7 +88,7 @@ public GenericCatalogOperations() {
   }
 
   @VisibleForTesting
-  GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {
+  public GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {

Review Comment:
   Changing this constructor to `public` increases the production API surface 
even though it’s annotated `@VisibleForTesting`. Since tests in the same 
package can access a package-private constructor, consider keeping it 
package-private to avoid encouraging non-test usage.
   ```suggestion
     GenericCatalogOperations(EntityStore store, IdGenerator idGenerator) {
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericCatalogOperations.java:
##########
@@ -231,23 +231,31 @@ public Table createTable(
       SortOrder[] sortOrders,
       Index[] indexes)
       throws NoSuchSchemaException, TableAlreadyExistsException {
-    Schema schema = loadSchema(NameIdentifier.of(ident.namespace().levels()));
-    String tableLocation = calculateTableLocation(schema, ident, properties);
-
     String format = properties.getOrDefault(Table.PROPERTY_TABLE_FORMAT, null);
     Preconditions.checkArgument(
         format != null, "Table format must be specified in table properties");
     format = format.toLowerCase(Locale.ROOT);
 
-    Map<String, String> newProperties = Maps.newHashMap(properties);
-    newProperties.put(Table.PROPERTY_LOCATION, tableLocation);
-    newProperties.put(Table.PROPERTY_TABLE_FORMAT, format);
-
     // Get the table operations for the specified table format.
     Supplier<ManagedTableOperations> tableOpsSupplier = 
tableOpsCache.get(format);
     Preconditions.checkArgument(tableOpsSupplier != null, "Unsupported table 
format: %s", format);
     ManagedTableOperations tableOps = tableOpsSupplier.get();
 
+    Map<String, String> newProperties = Maps.newHashMap(properties);
+    newProperties.put(Table.PROPERTY_TABLE_FORMAT, format);
+
+    Schema schema = loadSchema(NameIdentifier.of(ident.namespace().levels()));
+    String tableLocation =
+        (String)
+            propertiesMetadata
+                .tablePropertiesMetadata()
+                .getOrDefault(properties, Table.PROPERTY_LOCATION);
+    if (StringUtils.isNotBlank(tableLocation)) {
+      newProperties.put(Table.PROPERTY_LOCATION, 
ensureTrailingSlash(tableLocation));
+    } else if (tableOps.supportsGenerateTableLocation(newProperties)) {
+      newProperties.put(Table.PROPERTY_LOCATION, 
calculateTableLocation(schema, ident, properties));

Review Comment:
   With the new `supportsGenerateTableLocation` gating, a format can now create 
a table with no `location` even when neither table/schema/catalog location is 
set (previously `calculateTableLocation` would throw). If `location` is still 
required for all lakehouse tables, add a validation/exception when 
`tableLocation` is blank and `supportsGenerateTableLocation(...)` is false.
   ```suggestion
         newProperties.put(Table.PROPERTY_LOCATION, 
calculateTableLocation(schema, ident, properties));
       } else {
         Preconditions.checkArgument(
             false,
             "Table location must be specified in properties for format '%s' 
because the format does not support generating a table location",
             format);
   ```



##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables 
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables** 
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table 
summarizes supported
+operations:
+
+| Operation | Support Status  |
+|-----------|-----------------|
+| List      | ✅ Supported     |
+| Load      | ✅ Supported     |
+| Alter     | ✅ Supported     |
+| Create    | ✅ Supported     |
+| Drop      | ✅ Supported     |
+| Purge     | ❌ Not supported |
+

Review Comment:
   The Markdown tables here use a double leading pipe (`|| ...`) which renders 
as an extra empty column in most Markdown parsers (including Docusaurus). Use a 
standard table row with a single leading `|` for the header and separator rows.



##########
docs/lakehouse-generic-iceberg-table.md:
##########
@@ -0,0 +1,133 @@
+---
+title: "Iceberg table support"
+slug: /iceberg-table-support
+keywords:
+- lakehouse
+- iceberg
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage Iceberg tables 
in a Generic Lakehouse
+Catalog. The current release supports **external Iceberg tables** 
(metadata-only registration).
+Managed Iceberg tables will be supported in a future release.
+
+## Table Management
+
+### Supported Operations
+
+For Iceberg tables in a Generic Lakehouse Catalog, the following table 
summarizes supported
+operations:
+
+| Operation | Support Status  |
+|-----------|-----------------|
+| List      | ✅ Supported     |
+| Load      | ✅ Supported     |
+| Alter     | ✅ Supported     |
+| Create    | ✅ Supported     |
+| Drop      | ✅ Supported     |
+| Purge     | ❌ Not supported |
+
+*Alter supports metadata updates; `format` and `external` are immutable.

Review Comment:
   This line starts with `*Alter` (no space) so it will be parsed as italic 
text and may not render as intended. If this is meant to be a bullet, add a 
space after `*`; if it’s meant to be emphasis, close the `*...*`.
   ```suggestion
   * Alter supports metadata updates; `format` and `external` are immutable.
   ```



##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/iceberg/integration/test/CatalogGenericCatalogIcebergIT.java:
##########
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.catalog.lakehouse.iceberg.integration.test;
+
+import com.google.common.collect.Maps;
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.commons.io.FileUtils;
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.Schema;
+import org.apache.gravitino.client.GravitinoMetalake;
+import org.apache.gravitino.integration.test.util.BaseIT;
+import org.apache.gravitino.integration.test.util.GravitinoITUtils;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.Table;
+import org.apache.gravitino.rel.TableChange;
+import org.apache.gravitino.rel.expressions.transforms.Transforms;
+import org.apache.gravitino.rel.types.Types;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class CatalogGenericCatalogIcebergIT extends BaseIT {
+  private static final Logger LOG = 
LoggerFactory.getLogger(CatalogGenericCatalogIcebergIT.class);
+
+  public static final String METALAKE_NAME =
+      GravitinoITUtils.genRandomName("CatalogGenericLakeIcebergIT_metalake");
+
+  private final String catalogName =
+      GravitinoITUtils.genRandomName("CatalogGenericLakeIcebergIT_catalog");
+  private final String schemaPrefix = "CatalogGenericLakeIceberg_schema";
+  private final String tablePrefix = "CatalogGenericLakeIceberg_table";
+  private final String schemaName = 
GravitinoITUtils.genRandomName(schemaPrefix);
+  private final String tableName = GravitinoITUtils.genRandomName(tablePrefix);
+  private static final String TABLE_COMMENT = "table_comment";
+
+  private final String provider = "lakehouse-generic";
+
+  private GravitinoMetalake metalake;
+  private Catalog catalog;
+  private String tempDirectory;
+
+  @BeforeAll
+  public void startup() throws Exception {
+    createMetalake();
+    createCatalog();
+    createSchema();
+
+    Path tempDir = Files.createTempDirectory("icebergGenericCatalogIT");
+    tempDirectory = tempDir.toString();
+  }
+
+  @AfterAll
+  public void stop() throws IOException {
+    if (client != null) {
+      Arrays.stream(catalog.asSchemas().listSchemas())
+          .filter(schema -> !schema.equals("default"))
+          .forEach(schema -> catalog.asSchemas().dropSchema(schema, true));
+      Arrays.stream(metalake.listCatalogs()).forEach(name -> 
metalake.dropCatalog(name, true));
+      client.dropMetalake(METALAKE_NAME, true);
+    }
+    try {
+      closer.close();
+    } catch (Exception e) {
+      LOG.error("Failed to close CloseableGroup", e);
+    }
+
+    client = null;
+    FileUtils.deleteDirectory(new File(tempDirectory));
+  }

Review Comment:
   This integration test creates and deletes a temporary directory 
(`tempDirectory`), but the directory is never used by the test logic. Consider 
removing the temp dir setup/cleanup (or use it for a table `location`) to 
reduce noise and avoid potential cleanup failures if initialization is 
interrupted.



##########
core/src/main/java/org/apache/gravitino/catalog/ManagedTableOperations.java:
##########
@@ -537,4 +545,25 @@ int calculateColumnPosition(
       throw new IllegalArgumentException("Unsupported column position: " + 
position);
     }
   }
+
+  private void validateIndexSupport(Index[] indexes) {
+    if (!supportsIndex() && indexes != null && indexes.length > 0) {
+      throw new IllegalArgumentException("Indexes are not supported by this 
catalog");
+    }
+  }
+
+  private void validateIndexSupport(TableChange[] changes) {
+    if (supportsIndex()) {
+      return;
+    }
+    boolean hasIndexChange =
+        Arrays.stream(changes)
+            .anyMatch(
+                change ->
+                    change instanceof TableChange.AddIndex
+                        || change instanceof TableChange.DeleteIndex);
+    if (hasIndexChange) {
+      throw new IllegalArgumentException("Indexes are not supported by this 
catalog");
+    }

Review Comment:
   Index support is now enforced in `createTable`/`alterTable` via 
`supportsIndex()`, but there are no unit tests covering the new failure paths 
(e.g., create/alter with index changes when `supportsIndex()` is false). Add 
tests to prevent regressions in index validation behavior.



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/iceberg/IcebergTableOperations.java:
##########
@@ -0,0 +1,125 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+package org.apache.gravitino.catalog.lakehouse.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.gravitino.EntityStore;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.catalog.ManagedSchemaOperations;
+import org.apache.gravitino.catalog.ManagedTableOperations;
+import org.apache.gravitino.connector.SupportsSchemas;
+import org.apache.gravitino.exceptions.NoSuchSchemaException;
+import org.apache.gravitino.exceptions.TableAlreadyExistsException;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.Table;
+import org.apache.gravitino.rel.expressions.distributions.Distribution;
+import org.apache.gravitino.rel.expressions.distributions.Distributions;
+import org.apache.gravitino.rel.expressions.sorts.SortOrder;
+import org.apache.gravitino.rel.expressions.transforms.Transform;
+import org.apache.gravitino.rel.indexes.Index;
+import org.apache.gravitino.storage.IdGenerator;
+
+public class IcebergTableOperations extends ManagedTableOperations {
+
+  private final EntityStore store;
+
+  private final ManagedSchemaOperations schemaOps;
+
+  private final IdGenerator idGenerator;
+
+  public IcebergTableOperations(
+      EntityStore store, ManagedSchemaOperations schemaOps, IdGenerator 
idGenerator) {
+    this.store = store;
+    this.schemaOps = schemaOps;
+    this.idGenerator = idGenerator;
+  }
+
+  @Override
+  protected EntityStore store() {
+    return store;
+  }
+
+  @Override
+  protected SupportsSchemas schemas() {
+    return schemaOps;
+  }
+
+  @Override
+  protected IdGenerator idGenerator() {
+    return idGenerator;
+  }
+
+  @Override
+  protected boolean supportsIndex() {
+    return false;
+  }
+
+  @Override
+  public boolean supportsGenerateTableLocation(Map<String, String> properties) 
{
+    boolean external =
+        Optional.ofNullable(properties.get(Table.PROPERTY_EXTERNAL))
+            .map(Boolean::parseBoolean)
+            .orElse(false);
+    return !external;
+  }
+
+  @Override
+  public Table createTable(
+      NameIdentifier ident,
+      Column[] columns,
+      String comment,
+      Map<String, String> properties,
+      Transform[] partitions,
+      Distribution distribution,
+      SortOrder[] sortOrders,
+      Index[] indexes)
+      throws NoSuchSchemaException, TableAlreadyExistsException {
+    boolean external =
+        Optional.ofNullable(properties.get(Table.PROPERTY_EXTERNAL))
+            .map(Boolean::parseBoolean)
+            .orElse(false);
+    Preconditions.checkArgument(external, "Iceberg table in generic catalog 
must be external");
+
+    if (Distributions.NONE.equals(distribution) || distribution == null) {
+      int sortOrderCount = sortOrders == null ? 0 : sortOrders.length;
+      int partitionCount = partitions == null ? 0 : partitions.length;
+      distribution = getIcebergDefaultDistribution(sortOrderCount > 0, 
partitionCount > 0);
+    }

Review Comment:
   `createTable` enforces `external=true`, and `supportsGenerateTableLocation` 
returns `false` for external tables, so external Iceberg tables can be created 
without any `location` being set by the catalog layer. If external registration 
needs a physical reference, validate that `Table.PROPERTY_LOCATION` is 
present/non-blank (or clearly document that Iceberg tables in this catalog are 
location-less metadata only).



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericTablePropertiesMetadata.java:
##########
@@ -48,7 +49,15 @@ public class GenericTablePropertiesMetadata extends 
BasePropertiesMetadata {
                 Table.PROPERTY_TABLE_FORMAT,
                 "The format of the table",
                 true /* immutable */,
-                false /* hidden */));
+                false /* hidden */),
+            booleanPropertyEntry(
+                Table.PROPERTY_EXTERNAL,
+                "Indicate whether this is an external Iceberg table.",
+                false /* required */,
+                true /* immutable */,
+                false /* defaultValue */,
+                false /* hidden */,
+                false /* reserved */));

Review Comment:
   `Table.PROPERTY_EXTERNAL` is added as a generic table property, but the 
description says it indicates an "external Iceberg table". Since this metadata 
applies to all generic tables/formats, the description should be 
format-agnostic (or moved into format-specific property entries).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to