Copilot commented on code in PR #9678:
URL: https://github.com/apache/gravitino/pull/9678#discussion_r2780324054
##########
docs/lakehouse-generic-delta-table.md:
##########
@@ -0,0 +1,384 @@
+---
+title: "Delta Lake table support"
+slug: /delta-table-support
+keywords:
+- lakehouse
+- delta
+- delta lake
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Delta Lake as the underlying table format. Gravitino
supports registering and managing metadata for external Delta tables.
+
+:::info Current Support
+Gravitino currently supports **external Delta tables only**. This means:
+- You can register existing Delta tables in Gravitino
+- Gravitino manages metadata only (schema, location, properties)
+- The physical Delta table data remains independent
+- Dropping tables from Gravitino does not delete the underlying Delta data
+:::
+
+## Table Management
+
+### Supported Operations
+
+For Delta tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|------------------------------------------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | ❌ Not supported (use Delta Lake APIs directly) |
+| Create | ✅ Register external tables only |
+| Drop | ✅ Metadata only (data preserved) |
+| Purge | ❌ Not supported for external tables |
+
+:::note Feature Limitations
+- **External Tables Only:** Must set `external=true` when creating Delta tables
+- **Alter Operations:** Not supported; modify tables using Delta Lake APIs or
Spark, then update Gravitino metadata if needed
+- **Purge:** Not applicable for external tables; use DROP to remove metadata
only
+- **Partitioning:** Partition information can be specified but is
informational only (managed by Delta log)
+- **Sort Orders:** Not enforced by Delta Lake
+- **Distributions:** Not applicable for Delta Lake
+- **Indexes:** Not supported by Delta Lake
+:::
Review Comment:
This doc says partitioning is "informational only" and implies Gravitino
will accept partition specs, but the Delta implementation currently rejects any
non-empty partitions with an IllegalArgumentException. Please update the docs
to reflect the actual behavior (partitions not supported / must be empty).
##########
docs/lakehouse-generic-delta-table.md:
##########
@@ -0,0 +1,384 @@
+---
+title: "Delta Lake table support"
+slug: /delta-table-support
+keywords:
+- lakehouse
+- delta
+- delta lake
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Delta Lake as the underlying table format. Gravitino
supports registering and managing metadata for external Delta tables.
+
+:::info Current Support
+Gravitino currently supports **external Delta tables only**. This means:
+- You can register existing Delta tables in Gravitino
+- Gravitino manages metadata only (schema, location, properties)
+- The physical Delta table data remains independent
+- Dropping tables from Gravitino does not delete the underlying Delta data
+:::
+
+## Table Management
+
+### Supported Operations
+
+For Delta tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|------------------------------------------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | ❌ Not supported (use Delta Lake APIs directly) |
+| Create | ✅ Register external tables only |
+| Drop | ✅ Metadata only (data preserved) |
+| Purge | ❌ Not supported for external tables |
+
+:::note Feature Limitations
+- **External Tables Only:** Must set `external=true` when creating Delta tables
+- **Alter Operations:** Not supported; modify tables using Delta Lake APIs or
Spark, then update Gravitino metadata if needed
+- **Purge:** Not applicable for external tables; use DROP to remove metadata
only
+- **Partitioning:** Partition information can be specified but is
informational only (managed by Delta log)
+- **Sort Orders:** Not enforced by Delta Lake
+- **Distributions:** Not applicable for Delta Lake
+- **Indexes:** Not supported by Delta Lake
+:::
+
+### Data Type Mappings
+
+Delta Lake uses Apache Spark data types. The following table shows type
mappings between Gravitino and Delta/Spark:
+
+| Gravitino Type | Delta/Spark Type | Notes
|
+|---------------------|------------------------|---------------------------------|
+| `Boolean` | `BooleanType` |
|
+| `Byte` | `ByteType` |
|
+| `Short` | `ShortType` |
|
+| `Integer` | `IntegerType` |
|
+| `Long` | `LongType` |
|
+| `Float` | `FloatType` |
|
+| `Double` | `DoubleType` |
|
+| `Decimal(p, s)` | `DecimalType(p, s)` |
|
+| `String` | `StringType` |
|
+| `Binary` | `BinaryType` |
|
+| `Date` | `DateType` |
|
+| `Timestamp` | `TimestampType` |
|
+| `Timestamp_tz` | `TimestampNTZType` | Spark 3.4+
|
+| `List` | `ArrayType` |
|
+| `Map` | `MapType` |
|
+| `Struct` | `StructType` |
|
+
+### Table Properties
+
+Required and optional properties for Delta tables in a Generic Lakehouse
Catalog:
+
+| Property | Description
|
Default | Required | Since Version |
+|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|----------|---------------|
+| `format` | Table format: must be `delta`
| (none)
| Yes | 1.2.0 |
+| `location` | Storage path for the Delta table. Must point to a directory
containing Delta Lake metadata (_delta_log). Supports file://, s3://, hdfs://,
abfs://, gs://, and other Hadoop-compatible file systems. |
(none) | Yes | 1.2.0 |
+| `external` | Must be `true` for Delta tables. Indicates that Gravitino
manages metadata only <br/>and will not delete physical data when the table is
dropped.
| (none) | Yes | 1.2.0 |
+
+**Location Requirement:** Must be specified at table level for external Delta
table. See [Location
Resolution](./lakehouse-generic-catalog.md#key-property-location).
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns with
Delta-specific considerations. See [Table
Operations](./manage-relational-metadata-using-gravitino.md#table-operations)
for comprehensive documentation.
+
+The following sections provide examples and important details for working with
Delta tables.
+
+#### Registering an External Delta Table
+
+Register an existing Delta table in Gravitino without moving or modifying the
underlying data:
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "customer_orders",
+ "comment": "Customer orders Delta table",
+ "columns": [
+ {
+ "name": "order_id",
+ "type": "long",
+ "comment": "Order identifier",
+ "nullable": false
+ },
+ {
+ "name": "customer_id",
+ "type": "long",
+ "comment": "Customer identifier",
+ "nullable": false
+ },
+ {
+ "name": "order_date",
+ "type": "date",
+ "comment": "Order date",
+ "nullable": false
+ },
+ {
+ "name": "total_amount",
+ "type": "decimal(10,2)",
+ "comment": "Total order amount",
+ "nullable": true
+ }
+ ],
+ "properties": {
+ "format": "delta",
+ "external": "true",
+ "location": "s3://my-bucket/delta-tables/customer_orders"
+ }
+}'
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_delta_catalog/schemas/sales/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.types.Types;
+import com.google.common.collect.ImmutableMap;
+
+Catalog catalog =
gravitinoClient.loadCatalog("generic_lakehouse_delta_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> tableProperties = ImmutableMap.<String, String>builder()
+ .put("format", "delta")
+ .put("external", "true")
+ .put("location", "s3://my-bucket/delta-tables/customer_orders")
+ .build();
+
+tableCatalog.createTable(
+ NameIdentifier.of("sales", "customer_orders"),
+ new Column[] {
+ Column.of("order_id", Types.LongType.get(), "Order identifier", false,
false, null),
+ Column.of("customer_id", Types.LongType.get(), "Customer identifier",
false, false, null),
+ Column.of("order_date", Types.DateType.get(), "Order date", false,
false, null),
+ Column.of("total_amount", Types.DecimalType.of(10, 2), "Total order
amount", true, false, null)
+ },
+ "Customer orders Delta table",
+ tableProperties,
+ null, // partitions (informational only)
+ null, // distributions (not applicable)
+ null, // sortOrders (not enforced)
+ null // indexes (not supported)
+);
+```
+
+</TabItem>
+</Tabs>
+
+:::important Schema Specification
+When registering a Delta table in Gravitino, you must provide the schema
(columns) in the CREATE TABLE request. Gravitino stores this schema as metadata
but does not validate it against the Delta table's actual schema.
+
+**Best Practice:** Ensure the schema you provide matches the actual Delta
table schema to avoid inconsistencies.
+:::
+
+#### Loading a Delta Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
+
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_delta_catalog/schemas/sales/tables/customer_orders
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Table table = tableCatalog.loadTable(
+ NameIdentifier.of("sales", "customer_orders")
+);
+
+System.out.println("Table location: " + table.properties().get("location"));
+System.out.println("Columns: " + Arrays.toString(table.columns()));
+```
+
+</TabItem>
+</Tabs>
+
+#### Dropping a Delta Table
+
+Dropping a Delta table from Gravitino removes only the metadata entry. The
physical Delta table data remains intact.
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \
+
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_delta_catalog/schemas/sales/tables/customer_orders
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+boolean dropped = tableCatalog.dropTable(
+ NameIdentifier.of("sales", "customer_orders")
+);
+// The Delta table files at the location are NOT deleted
+```
+
+</TabItem>
+</Tabs>
+
+:::tip Metadata-Only Drop
+Since Delta tables are external, dropping them from Gravitino:
+- ✅ Removes the table from Gravitino's metadata
+- ✅ Preserves the Delta table data at its location
+- ✅ Allows you to re-register the same table later
+
+The Delta table can still be accessed directly via Delta Lake APIs, Spark, or
other tools.
+:::
+
+## Working with Delta Tables
+
+### Using Spark to Modify Delta Tables
+
+Since Gravitino does not support ALTER operations for Delta tables, use Apache
Spark or other Delta Lake tools to modify table structure:
+
+```java
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import io.delta.tables.DeltaTable;
+import static org.apache.spark.sql.functions.lit;
+
+// Create Spark session with Delta Lake support
+SparkSession spark = SparkSession.builder()
+ .appName("Delta Table Modification")
+ .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
+ .config("spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog")
+ .getOrCreate();
+
+// Read the table location from Gravitino
+String tableLocation = "s3://my-bucket/delta-tables/customer_orders";
+
+// Add a new column using Delta Lake
+DeltaTable deltaTable = DeltaTable.forPath(spark, tableLocation);
+Dataset<Row> df = deltaTable.toDF()
+ .withColumn("status", lit("pending"));
+
+df.write()
+ .format("delta")
+ .mode("overwrite")
+ .option("overwriteSchema", "true")
+ .save(tableLocation);
+```
+
+After modifying the Delta table, you can:
+1. Drop the table from Gravitino
+2. Re-register it with the updated schema
+
+### Reading Delta Tables via Gravitino
+
+Once registered in Gravitino, you can query Delta table metadata and use the
location to read data:
+
+```java
+// Load table metadata from Gravitino
+Table table = tableCatalog.loadTable(NameIdentifier.of("sales",
"customer_orders"));
+String location = table.properties().get("location");
+
+// Use the location to read the Delta table with Spark
+Dataset<Row> df = spark.read()
+ .format("delta")
+ .load(location);
+
+df.show();
+```
+
+### Partitioned Delta Tables
+
+While Delta Lake supports partitioning, Gravitino treats partition information
as metadata only:
+
+```java
+// Register a partitioned Delta table
+Map<String, String> properties = ImmutableMap.<String, String>builder()
+ .put("format", "delta")
+ .put("external", "true")
+ .put("location", "s3://my-bucket/delta-tables/sales_partitioned")
+ .build();
+
+// You can specify partitions for documentation purposes
+Transform[] partitions = new Transform[] {
+ Transforms.identity("year"),
+ Transforms.identity("month")
+};
+
+tableCatalog.createTable(
+ NameIdentifier.of("sales", "sales_partitioned"),
+ columns,
+ "Partitioned sales data",
+ properties,
+ partitions, // Stored as metadata, not enforced
+ null, null, null
+);
+```
Review Comment:
The "Partitioned Delta Tables" section provides an example that passes
partitions to createTable(), but DeltaTableOperations currently throws if
partitions are specified. Either remove/replace this example or change the
implementation to truly accept and store partitions as metadata (and adjust
tests accordingly).
##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/delta/integration/test/CatalogGenericCatalogDeltaIT.java:
##########
@@ -0,0 +1,596 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.catalog.lakehouse.delta.integration.test;
+
+import com.google.common.collect.Maps;
+import io.delta.kernel.Operation;
+import io.delta.kernel.Snapshot;
+import io.delta.kernel.TransactionBuilder;
+import io.delta.kernel.data.Row;
+import io.delta.kernel.defaults.engine.DefaultEngine;
+import io.delta.kernel.engine.Engine;
+import io.delta.kernel.types.IntegerType;
+import io.delta.kernel.types.StringType;
+import io.delta.kernel.types.StructType;
+import io.delta.kernel.utils.CloseableIterable;
+import io.delta.kernel.utils.CloseableIterator;
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.commons.io.FileUtils;
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.Schema;
+import org.apache.gravitino.catalog.lakehouse.delta.DeltaConstants;
+import org.apache.gravitino.client.GravitinoMetalake;
+import org.apache.gravitino.integration.test.util.BaseIT;
+import org.apache.gravitino.integration.test.util.GravitinoITUtils;
+import org.apache.gravitino.rel.Column;
+import org.apache.gravitino.rel.Table;
+import org.apache.gravitino.rel.TableChange;
+import org.apache.gravitino.rel.expressions.NamedReference;
+import org.apache.gravitino.rel.expressions.distributions.Distributions;
+import org.apache.gravitino.rel.expressions.sorts.SortOrder;
+import org.apache.gravitino.rel.expressions.sorts.SortOrders;
+import org.apache.gravitino.rel.expressions.transforms.Transform;
+import org.apache.gravitino.rel.expressions.transforms.Transforms;
+import org.apache.gravitino.rel.indexes.Index;
+import org.apache.gravitino.rel.indexes.Indexes;
+import org.apache.gravitino.rel.types.Types;
+import org.apache.hadoop.conf.Configuration;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Integration tests for Delta table support in Gravitino generic lakehouse
catalog.
+ *
+ * <p>These tests verify:
+ *
+ * <ul>
+ * <li>Creating a physical Delta table using Delta Kernel
+ * <li>Registering the Delta table in Gravitino catalog
+ * <li>Loading table metadata from Gravitino
+ * <li>Reading actual Delta table using location from Gravitino metadata
+ * <li>Verifying table still exists after dropping from Gravitino
(metadata-only drop)
+ * </ul>
+ */
+public class CatalogGenericCatalogDeltaIT extends BaseIT {
+ private static final Logger LOG =
LoggerFactory.getLogger(CatalogGenericCatalogDeltaIT.class);
+ public static final String METALAKE_NAME =
+ GravitinoITUtils.genRandomName("CatalogGenericDeltaIT_metalake");
+
+ public String catalogName =
GravitinoITUtils.genRandomName("CatalogGenericDeltaIT_catalog");
+ public String SCHEMA_PREFIX = "CatalogGenericDelta_schema";
+ public String schemaName = GravitinoITUtils.genRandomName(SCHEMA_PREFIX);
+ public String TABLE_PREFIX = "CatalogGenericDelta_table";
+ public String tableName = GravitinoITUtils.genRandomName(TABLE_PREFIX);
+ public static final String TABLE_COMMENT = "Delta table comment";
+ public static final String COL_NAME1 = "id";
+ public static final String COL_NAME2 = "name";
+ protected final String provider = "lakehouse-generic";
+ protected GravitinoMetalake metalake;
+ protected Catalog catalog;
+ protected String tempDirectory;
+ protected Engine deltaEngine;
+
+ @BeforeAll
+ public void startup() throws Exception {
+ createMetalake();
+ createCatalog();
+ createSchema();
+
+ Path tempDir = Files.createTempDirectory("deltaTempDir");
+ tempDirectory = tempDir.toString();
+
+ deltaEngine = DefaultEngine.create(new Configuration());
+ }
+
+ @AfterAll
+ public void stop() throws IOException {
+ if (client != null) {
+ Arrays.stream(catalog.asSchemas().listSchemas())
+ .filter(schema -> !schema.equals("default"))
+ .forEach(
+ (schema -> {
+ catalog.asSchemas().dropSchema(schema, true);
+ }));
+ Arrays.stream(metalake.listCatalogs())
+ .forEach(
+ catalogName -> {
+ metalake.dropCatalog(catalogName, true);
+ });
+ client.dropMetalake(METALAKE_NAME, true);
+ }
+ try {
+ closer.close();
+ } catch (Exception e) {
+ LOG.error("Failed to close CloseableGroup", e);
+ }
+
+ client = null;
+
+ FileUtils.deleteDirectory(new File(tempDirectory));
+ }
+
+ @AfterEach
+ public void resetSchema() throws InterruptedException {
+ catalog.asSchemas().dropSchema(schemaName, true);
+ createSchema();
+ }
+
+ @Test
+ public void testCreateDeltaTableAndRegisterToGravitino() throws Exception {
+ String tableLocation = tempDirectory + "/" + tableName;
+
+ // Step 1: Create a physical Delta table using Delta Kernel
+ StructType schema =
+ new StructType().add("id", IntegerType.INTEGER, true).add("name",
StringType.STRING, true);
+
+ TransactionBuilder txnBuilder =
+ io.delta.kernel.Table.forPath(deltaEngine, tableLocation)
+ .createTransactionBuilder(deltaEngine, "test",
Operation.CREATE_TABLE);
+
+ txnBuilder
+ .withSchema(deltaEngine, schema)
+ .withPartitionColumns(deltaEngine, Collections.emptyList())
+ .build(deltaEngine)
+ .commit(deltaEngine, emptyRowIterable());
+
+ LOG.info("Created Delta table at: {}", tableLocation);
+
+ // Step 2: Register the Delta table in Gravitino catalog
+ Column[] gravitinoColumns =
+ new Column[] {
+ Column.of(COL_NAME1, Types.IntegerType.get(), "id column"),
+ Column.of(COL_NAME2, Types.StringType.get(), "name column")
+ };
+
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+ Map<String, String> properties = createTableProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ Table gravitinoTable =
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ gravitinoColumns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null);
+
+ Assertions.assertEquals(tableName, gravitinoTable.name());
+ Assertions.assertEquals(TABLE_COMMENT, gravitinoTable.comment());
+ LOG.info("Registered Delta table in Gravitino catalog");
+
+ // Step 3: Load table metadata from Gravitino
+ Table loadedTable = catalog.asTableCatalog().loadTable(nameIdentifier);
+ Assertions.assertEquals(tableName, loadedTable.name());
+ Assertions.assertEquals(2, loadedTable.columns().length);
+
+ // Note: Gravitino may normalize the location by adding trailing slash
+ String locationFromMetadata =
loadedTable.properties().get(Table.PROPERTY_LOCATION);
+ Assertions.assertTrue(
+ locationFromMetadata.equals(tableLocation)
+ || locationFromMetadata.equals(tableLocation + "/"),
+ "Location should match with or without trailing slash");
+
+ // Step 4: Use the location from Gravitino metadata to read actual Delta
table
+ Assertions.assertNotNull(locationFromMetadata);
+
+ // Read Delta table using Delta Kernel
+ io.delta.kernel.Table deltaTable =
+ io.delta.kernel.Table.forPath(deltaEngine, locationFromMetadata);
+ Snapshot snapshot = deltaTable.getLatestSnapshot(deltaEngine);
+ Assertions.assertNotNull(snapshot);
+
+ StructType deltaSchema = snapshot.getSchema(deltaEngine);
+ Assertions.assertEquals(2, deltaSchema.fields().size());
+ Assertions.assertEquals(COL_NAME1, deltaSchema.fields().get(0).getName());
+ Assertions.assertEquals(COL_NAME2, deltaSchema.fields().get(1).getName());
+
+ // Step 5: Drop table from Gravitino catalog (metadata only)
+ boolean dropped = catalog.asTableCatalog().dropTable(nameIdentifier);
+ Assertions.assertTrue(dropped);
+
+ // Step 6: Verify Delta table still exists at location and can be accessed
+ io.delta.kernel.Table deltaTableAfterDrop =
+ io.delta.kernel.Table.forPath(deltaEngine, locationFromMetadata);
+ Snapshot snapshotAfterDrop =
deltaTableAfterDrop.getLatestSnapshot(deltaEngine);
+ Assertions.assertNotNull(snapshotAfterDrop);
+ Assertions.assertEquals(2,
snapshotAfterDrop.getSchema(deltaEngine).fields().size());
+ }
+
+ @Test
+ public void testCreateDeltaTableWithoutExternalFails() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ String tableLocation = tempDirectory + "/" + tableName;
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+
+ Exception exception =
+ Assertions.assertThrows(
+ Exception.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null));
+
+ Assertions.assertTrue(exception.getMessage().contains("external Delta
tables"));
+ }
+
+ @Test
+ public void testCreateDeltaTableWithoutLocationFails() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ Exception exception =
+ Assertions.assertThrows(
+ Exception.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null));
+
+ Assertions.assertTrue(exception.getMessage().contains("location"));
+ }
+
+ @Test
+ public void testAlterDeltaTableFails() throws Exception {
+ String tableLocation = tempDirectory + "/" + tableName + "_alter";
+
+ // Create physical Delta table
+ StructType schema =
+ new StructType().add("id", IntegerType.INTEGER, true).add("name",
StringType.STRING, true);
+
+ TransactionBuilder txnBuilder =
+ io.delta.kernel.Table.forPath(deltaEngine, tableLocation)
+ .createTransactionBuilder(deltaEngine, "test",
Operation.CREATE_TABLE);
+
+ txnBuilder
+ .withSchema(deltaEngine, schema)
+ .withPartitionColumns(deltaEngine, Collections.emptyList())
+ .build(deltaEngine)
+ .commit(deltaEngine, emptyRowIterable());
+
+ // Register in Gravitino
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null);
+
+ TableChange addColumn = TableChange.addColumn(new String[] {"new_col"},
Types.StringType.get());
+
+ Exception exception =
+ Assertions.assertThrows(
+ UnsupportedOperationException.class,
+ () -> catalog.asTableCatalog().alterTable(nameIdentifier,
addColumn));
+
+ Assertions.assertTrue(exception.getMessage().contains("ALTER TABLE"));
+ Assertions.assertTrue(exception.getMessage().contains("not supported"));
+ }
+
+ @Test
+ public void testPurgeDeltaTableFails() throws Exception {
+ String tableLocation = tempDirectory + "/" + tableName + "_purge";
+
+ // Create physical Delta table
+ StructType schema =
+ new StructType().add("id", IntegerType.INTEGER, true).add("name",
StringType.STRING, true);
+
+ TransactionBuilder txnBuilder =
+ io.delta.kernel.Table.forPath(deltaEngine, tableLocation)
+ .createTransactionBuilder(deltaEngine, "test",
Operation.CREATE_TABLE);
+
+ txnBuilder
+ .withSchema(deltaEngine, schema)
+ .withPartitionColumns(deltaEngine, Collections.emptyList())
+ .build(deltaEngine)
+ .commit(deltaEngine, emptyRowIterable());
+
+ // Register in Gravitino
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null);
+
+ Exception exception =
+ Assertions.assertThrows(
+ UnsupportedOperationException.class,
+ () -> catalog.asTableCatalog().purgeTable(nameIdentifier));
+
+ Assertions.assertTrue(exception.getMessage().contains("Purge"));
+ Assertions.assertTrue(exception.getMessage().contains("not supported"));
+ }
+
+ @Test
+ public void testCreateDeltaTableWithPartitionsThrowsException() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ String tableLocation = tempDirectory + "/" + tableName;
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ Transform[] partitions = new Transform[]
{Transforms.identity("created_at")};
+
+ IllegalArgumentException exception =
+ Assertions.assertThrows(
+ IllegalArgumentException.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ partitions,
+ null,
+ null,
+ null));
+
+ Assertions.assertTrue(exception.getMessage().contains("partitioning"));
+ Assertions.assertTrue(exception.getMessage().contains("doesn't support"));
+ }
+
+ @Test
+ public void testCreateDeltaTableWithDistributionThrowsException() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ String tableLocation = tempDirectory + "/" + tableName;
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ IllegalArgumentException exception =
+ Assertions.assertThrows(
+ IllegalArgumentException.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ null,
+ Distributions.hash(5, NamedReference.field(COL_NAME1)),
+ null,
+ null));
+
+ Assertions.assertTrue(exception.getMessage().contains("distribution"));
+ Assertions.assertTrue(exception.getMessage().contains("doesn't support"));
+ }
+
+ @Test
+ public void testCreateDeltaTableWithSortOrdersThrowsException() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ String tableLocation = tempDirectory + "/" + tableName;
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ IllegalArgumentException exception =
+ Assertions.assertThrows(
+ IllegalArgumentException.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ null,
+ null,
+ new SortOrder[]
{SortOrders.ascending(NamedReference.field(COL_NAME1))},
+ null));
+
+ Assertions.assertTrue(exception.getMessage().contains("sort orders"));
+ Assertions.assertTrue(exception.getMessage().contains("doesn't support"));
+ }
+
+ @Test
+ public void testCreateDeltaTableWithIndexesThrowsException() {
+ Column[] columns = createColumns();
+ NameIdentifier nameIdentifier = NameIdentifier.of(schemaName, tableName);
+
+ Map<String, String> properties = createTableProperties();
+ String tableLocation = tempDirectory + "/" + tableName;
+ properties.put(Table.PROPERTY_TABLE_FORMAT,
DeltaConstants.DELTA_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_LOCATION, tableLocation);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+
+ Index[] indexes = new Index[] {Indexes.primary("pk_id", new String[][]
{{COL_NAME1}})};
+
+ IllegalArgumentException exception =
+ Assertions.assertThrows(
+ IllegalArgumentException.class,
+ () ->
+ catalog
+ .asTableCatalog()
+ .createTable(
+ nameIdentifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ null,
+ null,
+ null,
+ indexes));
+
+ Assertions.assertTrue(exception.getMessage().contains("indexes"));
+ Assertions.assertTrue(exception.getMessage().contains("doesn't support"));
+ }
+
+ protected Map<String, String> createSchemaProperties() {
+ Map<String, String> properties = new HashMap<>();
+ properties.put("key1", "val1");
+ properties.put("key2", "val2");
+ return properties;
+ }
+
+ private void createMetalake() {
+ GravitinoMetalake[] gravitinoMetalakes = client.listMetalakes();
+ Assertions.assertEquals(0, gravitinoMetalakes.length);
+
+ client.createMetalake(METALAKE_NAME, "comment", Collections.emptyMap());
+ GravitinoMetalake loadMetalake = client.loadMetalake(METALAKE_NAME);
+ Assertions.assertEquals(METALAKE_NAME, loadMetalake.name());
+
+ metalake = loadMetalake;
+ }
+
+ protected void createCatalog() {
+ Map<String, String> properties = Maps.newHashMap();
+ metalake.createCatalog(catalogName, Catalog.Type.RELATIONAL, provider,
"comment", properties);
+
+ catalog = metalake.loadCatalog(catalogName);
+ }
+
+ private void createSchema() throws InterruptedException {
+ Map<String, String> schemaProperties = createSchemaProperties();
+ String comment = "schema comment";
+ catalog.asSchemas().createSchema(schemaName, comment, schemaProperties);
+ Schema loadSchema = catalog.asSchemas().loadSchema(schemaName);
+ Assertions.assertEquals(schemaName, loadSchema.name());
+ Assertions.assertEquals(comment, loadSchema.comment());
+ Assertions.assertEquals("val1", loadSchema.properties().get("key1"));
+ Assertions.assertEquals("val2", loadSchema.properties().get("key2"));
+ }
+
+ private Column[] createColumns() {
+ Column col1 = Column.of(COL_NAME1, Types.IntegerType.get(), "id column");
+ Column col2 = Column.of(COL_NAME2, Types.StringType.get(), "name column");
+ Column col3 = Column.of("created_at", Types.DateType.get(), "created_at
column");
+ return new Column[] {col1, col2, col3};
+ }
+
+ protected Map<String, String> createTableProperties() {
+ Map<String, String> properties = Maps.newHashMap();
+ properties.put("key1", "val1");
+ properties.put("key2", "val2");
+ return properties;
+ }
+
+ /**
+ * Helper method to create an empty {@code CloseableIterable<Row>} for Delta
Kernel transaction
+ * commits.
+ */
+ private static CloseableIterable<Row> emptyRowIterable() {
+ return new CloseableIterable<Row>() {
+
+ @Override
+ public CloseableIterator<Row> iterator() {
+ return new CloseableIterator<Row>() {
+ @Override
+ public void close() throws IOException {
+ // No resources to close
+ }
+
+ @Override
+ public boolean hasNext() {
+ return false;
+ }
+
+ @Override
+ public Row next() {
+ throw new java.util.NoSuchElementException("Empty iterator");
+ }
Review Comment:
Avoid using fully-qualified class names inside methods when a normal import
will do (e.g., NoSuchElementException). This keeps the code consistent with the
rest of the codebase's import style.
##########
docs/lakehouse-generic-delta-table.md:
##########
@@ -0,0 +1,384 @@
+---
+title: "Delta Lake table support"
+slug: /delta-table-support
+keywords:
+- lakehouse
+- delta
+- delta lake
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Delta Lake as the underlying table format. Gravitino
supports registering and managing metadata for external Delta tables.
+
+:::info Current Support
+Gravitino currently supports **external Delta tables only**. This means:
+- You can register existing Delta tables in Gravitino
+- Gravitino manages metadata only (schema, location, properties)
+- The physical Delta table data remains independent
+- Dropping tables from Gravitino does not delete the underlying Delta data
+:::
Review Comment:
PR description says this introduces no user-facing change, but this PR adds
a new supported table format (delta) and a new user guide page. Please update
the PR description/user-facing-change answer to match the actual impact.
##########
docs/lakehouse-generic-delta-table.md:
##########
@@ -0,0 +1,384 @@
+---
+title: "Delta Lake table support"
+slug: /delta-table-support
+keywords:
+- lakehouse
+- delta
+- delta lake
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Delta Lake as the underlying table format. Gravitino
supports registering and managing metadata for external Delta tables.
+
+:::info Current Support
+Gravitino currently supports **external Delta tables only**. This means:
+- You can register existing Delta tables in Gravitino
+- Gravitino manages metadata only (schema, location, properties)
+- The physical Delta table data remains independent
+- Dropping tables from Gravitino does not delete the underlying Delta data
+:::
+
+## Table Management
+
+### Supported Operations
+
+For Delta tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|------------------------------------------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | ❌ Not supported (use Delta Lake APIs directly) |
+| Create | ✅ Register external tables only |
+| Drop | ✅ Metadata only (data preserved) |
+| Purge | ❌ Not supported for external tables |
+
+:::note Feature Limitations
+- **External Tables Only:** Must set `external=true` when creating Delta tables
+- **Alter Operations:** Not supported; modify tables using Delta Lake APIs or
Spark, then update Gravitino metadata if needed
+- **Purge:** Not applicable for external tables; use DROP to remove metadata
only
+- **Partitioning:** Partition information can be specified but is
informational only (managed by Delta log)
+- **Sort Orders:** Not enforced by Delta Lake
+- **Distributions:** Not applicable for Delta Lake
+- **Indexes:** Not supported by Delta Lake
+:::
+
+### Data Type Mappings
+
+Delta Lake uses Apache Spark data types. The following table shows type
mappings between Gravitino and Delta/Spark:
+
+| Gravitino Type | Delta/Spark Type | Notes
|
+|---------------------|------------------------|---------------------------------|
+| `Boolean` | `BooleanType` |
|
+| `Byte` | `ByteType` |
|
+| `Short` | `ShortType` |
|
+| `Integer` | `IntegerType` |
|
+| `Long` | `LongType` |
|
+| `Float` | `FloatType` |
|
+| `Double` | `DoubleType` |
|
+| `Decimal(p, s)` | `DecimalType(p, s)` |
|
+| `String` | `StringType` |
|
+| `Binary` | `BinaryType` |
|
+| `Date` | `DateType` |
|
+| `Timestamp` | `TimestampType` |
|
+| `Timestamp_tz` | `TimestampNTZType` | Spark 3.4+
|
Review Comment:
The timestamp type mapping appears reversed: Gravitino has both `timestamp`
(no TZ) and `timestamp_tz` (with TZ), while Spark has `TimestampNTZType` (no
TZ) and `TimestampType` (with TZ semantics). Please correct the table so the
timezone/non-timezone types map consistently.
```suggestion
| `Timestamp` | `TimestampNTZType` | Spark 3.4+
|
| `Timestamp_tz` | `TimestampType` |
|
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/generic/GenericTablePropertiesMetadata.java:
##########
@@ -40,7 +40,10 @@ public class GenericTablePropertiesMetadata extends
BasePropertiesMetadata {
ImmutableList.of(
stringOptionalPropertyEntry(
Table.PROPERTY_LOCATION,
- "The root directory of the generic table.",
+ "The directory of the table. For managed table, if this is not
specified"
+ + " in the table property, it will use the one in catalog
/ schema level and "
+ + "concatenate with the table name. For external table,
this property is"
+ + "required.",
Review Comment:
The location property description has a missing space in "isrequired", which
will render incorrectly in property metadata/help text. Please change it to "is
required" (and consider adding a space before "required" in the concatenation).
```suggestion
+ " required.",
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]