mchades commented on code in PR #9173:
URL: https://github.com/apache/gravitino/pull/9173#discussion_r2606587055
##########
docs/lakehouse-generic-catalog.md:
##########
@@ -0,0 +1,186 @@
+---
+title: "Generic Lakehouse Catalog"
+slug: /lakehouse-generic-catalog
+keywords:
+ - lakehouse
+ - lance
+ - metadata
+ - generic catalog
+ - file system
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Generic Lakehouse Catalog is a Gravitino catalog implementation designed
to seamlessly integrate with lakehouse storage systems built on file
system-based architectures. This catalog enables unified metadata management
for lakehouse tables stored on various storage backends, providing a consistent
interface for data discovery, governance, and access control.
+
+Currently, Gravitino fully supports the **Lance** lakehouse format, with plans
to extend support to additional formats in the future.
+
+### Why Use Generic Lakehouse Catalog?
+
+1. **Unified Metadata Management**: Single source of truth for table metadata
across multiple storage backends
+2. **Multi-Format Support**: Extensible architecture to support various
lakehouse table formats such as Lance, Iceberg, Hudi, etc.
+3. **Storage Flexibility**: Work with any file system, local, or cloud object
stores
+4. **Gravitino Integration**: Leverage Gravitino's metadata management, access
control, lineage tracking, and data discovery
+5. **Easy Migration**: Register existing lakehouse tables without data movement
+
+## Catalog Management
+
+### Capabilities
+
+The Generic Lakehouse Catalog provides comprehensive relational metadata
management capabilities equivalent to standard relational catalogs:
+
+**Supported Operations:**
+- ✅ Create, read, update, and delete catalogs
+- ✅ List all catalogs in a metalake
+- ✅ Manage catalog properties and metadata
+- ✅ Set and modify catalog locations
+- ✅ Configure storage backend credentials
+
+For detailed information on available operations, see [Manage Relational
Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md).
+
+### Catalog Properties
+
+| Property | Description | Example
| Required | Since Version |
+|------------|----------------------------------------------|-------------------------|----------|---------------|
+| `provider` | Catalog provider type |
`lakehouse-generic` | Yes | 1.1.0 |
+| `location` | Root storage path for all schemas and tables |
`s3://bucket/lakehouse` | False | 1.1.0 |
Review Comment:
uses `No` instead of `False` to align with other docs
##########
docs/lakehouse-generic-catalog.md:
##########
@@ -0,0 +1,186 @@
+---
+title: "Generic Lakehouse Catalog"
+slug: /lakehouse-generic-catalog
+keywords:
+ - lakehouse
+ - lance
+ - metadata
+ - generic catalog
+ - file system
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Generic Lakehouse Catalog is a Gravitino catalog implementation designed
to seamlessly integrate with lakehouse storage systems built on file
system-based architectures. This catalog enables unified metadata management
for lakehouse tables stored on various storage backends, providing a consistent
interface for data discovery, governance, and access control.
+
+Currently, Gravitino fully supports the **Lance** lakehouse format, with plans
to extend support to additional formats in the future.
+
+### Why Use Generic Lakehouse Catalog?
+
+1. **Unified Metadata Management**: Single source of truth for table metadata
across multiple storage backends
+2. **Multi-Format Support**: Extensible architecture to support various
lakehouse table formats such as Lance, Iceberg, Hudi, etc.
+3. **Storage Flexibility**: Work with any file system, local, or cloud object
stores
+4. **Gravitino Integration**: Leverage Gravitino's metadata management, access
control, lineage tracking, and data discovery
+5. **Easy Migration**: Register existing lakehouse tables without data movement
+
+## Catalog Management
+
+### Capabilities
+
+The Generic Lakehouse Catalog provides comprehensive relational metadata
management capabilities equivalent to standard relational catalogs:
+
+**Supported Operations:**
+- ✅ Create, read, update, and delete catalogs
+- ✅ List all catalogs in a metalake
+- ✅ Manage catalog properties and metadata
+- ✅ Set and modify catalog locations
+- ✅ Configure storage backend credentials
+
+For detailed information on available operations, see [Manage Relational
Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md).
+
+### Catalog Properties
+
+| Property | Description | Example
| Required | Since Version |
+|------------|----------------------------------------------|-------------------------|----------|---------------|
+| `provider` | Catalog provider type |
`lakehouse-generic` | Yes | 1.1.0 |
+| `location` | Root storage path for all schemas and tables |
`s3://bucket/lakehouse` | False | 1.1.0 |
+
+#### Key Property: `location`
+
+The `location` property specifies the root directory for the lakehouse table.
All schemas and tables are stored under this location unless explicitly
overridden at the schema or table level.
+
+**Location Resolution Hierarchy:**
+1. Table-level `location` (highest priority)
+2. Schema-level `location`, then the location of the table will be
`{schema_location}/{table_name}`
+3. Catalog-level `location` (fallback), then the location of the table will be
`{catalog_location}/{schema_name}/{table_name}`
+
+**Example Location Hierarchy:**
+```
+Catalog location: hdfs://namenode:9000/lakehouse
+└── Schema: sales (hdfs://namenode:9000/lakehouse/sales)
+ ├── Table: orders (hdfs://namenode:9000/lakehouse/sales/orders)
+ └── Table: customers (custom: s3://analytics-bucket/customers)
Review Comment:
This example does not clarify whether a location is specified for the schema
or the table.
##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Lance as the underlying table format.
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | Not support now |
+| Create | ✅ Full |
+| Register | ✅ Full |
+| Drop | ✅ Full |
+| Purge | ✅ Full |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+ :::
+
+### Data Type Mappings
+
+Lance uses Apache Arrow for table schemas. The following table shows type
mappings between Gravitino and Arrow:
+
+| Gravitino Type | Arrow Type |
+|----------------------------------|-----------------------------------------|
+| `Struct` | `Struct` |
+| `Map` | `Map` |
+| `List` | `Array` |
+| `Boolean` | `Boolean` |
+| `Byte` | `Int8` |
+| `Short` | `Int16` |
+| `Integer` | `Int32` |
+| `Long` | `Int64` |
+| `Float` | `Float` |
+| `Double` | `Double` |
+| `String` | `Utf8` |
+| `Binary` | `Binary` |
+| `Decimal(p, s)` | `Decimal(p, s)` (128-bit) |
+| `Date` | `Date` |
+| `Timestamp`/`Timestamp(6)` | `TimestampType withoutZone` |
+| `Timestamp(0)` | `TimestampType Second withoutZone` |
+| `Timestamp(3)` | `TimestampType Millisecond withoutZone` |
+| `Timestamp(9)` | `TimestampType Nanosecond withoutZone` |
+| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc` |
+| `Timestamp_tz(0)` | `TimestampType Second withUtc` |
+| `Timestamp_tz(3)` | `TimestampType Millisecond withUtc` |
+| `Timestamp_tz(9)` | `TimestampType Nanosecond withUtc` |
+| `Time`/`Time(9)` | `Time Nanosecond` |
+| `Null` | `Null` |
+| `Fixed(n)` | `Fixed-Size Binary(n)` |
+| `Interval_year` | `Interval(YearMonth)` |
+| `Interval_day` | `Duration(Microsecond)` |
+| `External(arrow_field_json_str)` | Any Arrow Field |
+
+### External Type Support
+
+For Arrow types not natively mapped in Gravitino, use the
`External(arrow_field_json_str)` type, which accepts a JSON string
representation of an Arrow `Field`.
+
+**Requirements:**
+- JSON must conform to Apache Arrow [Field
specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68)
+- `name` attribute must match column name exactly
+- `nullable` attribute must match column nullability
+- `children` array:
+ - Empty for primitive types
+ - Contains child field definitions for complex types (Struct, List)
+
+**Examples:**
+
+| Arrow Type | External Type Definition
|
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Large Utf8` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")`
|
+| `Large Binary` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")`
|
+| `Large List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+| `Fixed-Size List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+
+### Table Properties
+
+Required and optional properties for tables in a Generic Lakehouse Catalog:
+
+| Property | Description
| Default | Required | Since
Version |
+|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|
+| `format` | Table format: `lance`, currently only `lance` is
fully supported.
| (none) | Yes |
1.1.0 |
+| `location` | Storage path for table metadata and data, Lance
currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store.
| (none) | Conditional* |
1.1.0 |
+| `external` | Whether the data directory is an external location.
If it's `true`, dropping a table will only remove metadata in Gravitino and
will not delete the data directory, and purge table will delete both. For a
non-external table, dropping will drop both.
| false | No |
1.1.0 |
+| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`,
`EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for
registering tables
| `CREATE` | No
| 1.1.0 |
Review Comment:
Do you need to describe the behavior corresponding to different modes?
##########
docs/lakehouse-generic-catalog.md:
##########
@@ -0,0 +1,186 @@
+---
+title: "Generic Lakehouse Catalog"
+slug: /lakehouse-generic-catalog
+keywords:
+ - lakehouse
+ - lance
+ - metadata
+ - generic catalog
+ - file system
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Generic Lakehouse Catalog is a Gravitino catalog implementation designed
to seamlessly integrate with lakehouse storage systems built on file
system-based architectures. This catalog enables unified metadata management
for lakehouse tables stored on various storage backends, providing a consistent
interface for data discovery, governance, and access control.
+
+Currently, Gravitino fully supports the **Lance** lakehouse format, with plans
to extend support to additional formats in the future.
+
+### Why Use Generic Lakehouse Catalog?
+
+1. **Unified Metadata Management**: Single source of truth for table metadata
across multiple storage backends
+2. **Multi-Format Support**: Extensible architecture to support various
lakehouse table formats such as Lance, Iceberg, Hudi, etc.
+3. **Storage Flexibility**: Work with any file system, local, or cloud object
stores
+4. **Gravitino Integration**: Leverage Gravitino's metadata management, access
control, lineage tracking, and data discovery
+5. **Easy Migration**: Register existing lakehouse tables without data movement
+
+## Catalog Management
+
+### Capabilities
+
+The Generic Lakehouse Catalog provides comprehensive relational metadata
management capabilities equivalent to standard relational catalogs:
+
+**Supported Operations:**
+- ✅ Create, read, update, and delete catalogs
+- ✅ List all catalogs in a metalake
+- ✅ Manage catalog properties and metadata
+- ✅ Set and modify catalog locations
+- ✅ Configure storage backend credentials
+
+For detailed information on available operations, see [Manage Relational
Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md).
+
+### Catalog Properties
+
+| Property | Description | Example
| Required | Since Version |
+|------------|----------------------------------------------|-------------------------|----------|---------------|
+| `provider` | Catalog provider type |
`lakehouse-generic` | Yes | 1.1.0 |
+| `location` | Root storage path for all schemas and tables |
`s3://bucket/lakehouse` | False | 1.1.0 |
+
+#### Key Property: `location`
+
+The `location` property specifies the root directory for the lakehouse table.
All schemas and tables are stored under this location unless explicitly
overridden at the schema or table level.
+
+**Location Resolution Hierarchy:**
+1. Table-level `location` (highest priority)
+2. Schema-level `location`, then the location of the table will be
`{schema_location}/{table_name}`
+3. Catalog-level `location` (fallback), then the location of the table will be
`{catalog_location}/{schema_name}/{table_name}`
+
+**Example Location Hierarchy:**
+```
+Catalog location: hdfs://namenode:9000/lakehouse
+└── Schema: sales (hdfs://namenode:9000/lakehouse/sales)
+ ├── Table: orders (hdfs://namenode:9000/lakehouse/sales/orders)
+ └── Table: customers (custom: s3://analytics-bucket/customers)
+```
+
+### Creating a Catalog
+
+Use `provider: "lakehouse-generic"` when creating a generic lakehouse catalog.
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "generic_lakehouse_catalog",
+ "type": "RELATIONAL",
+ "comment": "Generic lakehouse catalog for Lance datasets",
+ "provider": "lakehouse-generic",
+ "properties": {
+ "location": "hdfs://localhost:9000/user/lakehouse"
+ }
+}' http://localhost:8090/api/metalakes/metalake/catalogs
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+GravitinoClient gravitinoClient = GravitinoClient
+ .builder("http://127.0.0.1:8090")
+ .withMetalake("metalake")
+ .build();
+
+Map<String, String> catalogProperties = ImmutableMap.<String, String>builder()
+ .put("location", "hdfs://localhost:9000/user/lakehouse")
+ .build();
+
+Catalog catalog = gravitinoClient.createCatalog(
+ "generic_lakehouse_catalog",
+ Type.RELATIONAL,
+ "lakehouse-generic",
+ "Generic lakehouse catalog for Lance datasets",
+ catalogProperties
+);
+```
+
+</TabItem>
+</Tabs>
+
+Other catalog operations are general with relational catalogs. See [Catalog
Operations](./manage-relational-metadata-using-gravitino.md#catalog-operations)
for detailed documentation.
+
+## Schema Management
+
+### Capabilities
+
+Schema operations follow the same patterns as relational catalogs:
+
+**Supported Operations:**
+- ✅ Create schemas with custom properties
+- ✅ List all schemas in a catalog
+- ✅ Load schema metadata and properties
+- ✅ Update schema properties
+- ✅ Delete schemas
+- ✅ Check schema existence
+
+See [Schema
Operations](./manage-relational-metadata-using-gravitino.md#schema-operations)
for detailed documentation.
+
+### Schema Properties
+
+Schemas inherit catalog properties and can override specific settings:
+
+| Property | Description |
Example | Required | Since version |
+|------------|----------------------------------------------------------|------------------------------|----------|---------------|
+| `location` | Custom storage root path for all tables under the schema |
's3://bucket/path_to_schema' | No | 1.1.0 |
+
+About location resolution hierarchy, please see [Key Property:
`location`](#key-property-location) in the Catalog Management section for more
details.
+
+### Schema Operations
+
+**Creating a Schema:**
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "sales",
+ "comment": "Sales department data",
+ "properties": {
+ "location": "s3://sales-bucket/data",
+ "owner": "sales-team"
+ }
+}'
http://localhost:8090/api/metalakes/metalake/catalogs/lakehouse_catalog/schemas
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Map<String, String> schemaProperties = ImmutableMap.<String, String>builder()
+ .put("location", "s3://sales-bucket/data")
+ .put("owner", "sales-team")
+ .build();
+
+catalog.asSchemas().createSchema(
+ "sales",
+ "Sales department data",
+ schemaProperties
+);
+```
+
+</TabItem>
+</Tabs>
+
+For additional operations, refer to [Schema Operations
documentation](./manage-relational-metadata-using-gravitino.md#schema-operations).
+
+### Supported Operations
+
+Since different lakehouse table formats have varying capabilities, table
operation support may differ. The following are table operations for different
lakehouse formats.
Review Comment:
I think you should add the title `## Table Management` before this sentence
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,394 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://docs.lancedb.com/api-reference/introduction). For
detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/rest/catalog-spec/).
+
+### What is Lance?
+
+[Lance](https://lance.org/format/) is a modern columnar data format designed
for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
+│ Service │ (Metadata Backend)
+└────────┬────────┘
+ │ File System Operations
+ ▼
+┌─────────────────┐
+│ Lance Datasets │
+│ (S3/GCS/Local) │
+└─────────────────┘
+```
+
+**Key Features:**
+- Full compliance with Lance REST API specification
+- Can run standalone or integrated with Gravitino server
+- Support for namespace and table management
+- Index creation and management capabilities (Index operations are not
supported in version 1.1.0)
+- Metadata stored in Gravitino for unified governance
+
+## Supported Operations
+
+The Lance REST service provides comprehensive support for namespace
management, table management, and index operations. The table below lists all
supported operations:
+
+| Operation | Description
| HTTP Method | Endpoint Pattern | Since Version |
+|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------|
+| CreateNamespace | Create a new Lance namespace
| POST | `/lance/v1/namespace/{id}/create` | 1.1.0 |
+| ListNamespaces | List all namespaces under a parent namespace
| GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 |
+| DescribeNamespace | Retrieve detailed information about a specific namespace
| POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 |
+| DropNamespace | Delete a namespace
| POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 |
+| NamespaceExists | Check whether a namespace exists
| POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 |
+| ListTables | List all tables in a namespace
| GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 |
+| CreateTable | Create a new table in a namespace
| POST | `/lance/v1/table/{id}/create` | 1.1.0 |
+| DropTable | Delete a table including both metadata and data
| POST | `/lance/v1/table/{id}/drop` | 1.1.0 |
+| TableExists | Check whether a table exists
| POST | `/lance/v1/table/{id}/exists` | 1.1.0 |
+| RegisterTable | Register an existing Lance table to a namespace
| POST | `/lance/v1/table/{id}/register` | 1.1.0 |
+| DeregisterTable | Unregister a table from a namespace (metadata only, data
remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 |
+
+More details, please refer to the [Lance REST API
specification](https://lance.org/format/namespace/rest/catalog-spec/)
+
+### Operation Details
+
+Some operations have specific behaviors and modes. Below are important details
to consider:
+
+#### Namespace Operations
+
+**CreateNamespace** supports three modes:
+- `create`: Fails if namespace already exists
+- `exist_ok`: Succeeds even if namespace exists
+- `overwrite`: Replaces existing namespace
+
+**DropNamespace** behavior:
+- Recursively deletes all child namespaces and tables
+- Deletes both metadata and Lance data files
+- Operation is irreversible
+
+#### Table Operations
+
+**RegisterTable vs CreateTable**:
+- **RegisterTable**: Links existing Lance datasets into Gravitino catalog
without data movement
+- **CreateTable**: Creates new Lance table with schema and write metadata files
+:::
+The `version` field of `CreateTable` response is always null, which stands for
the latest version.
+:::
+
+**DropTable vs DeregisterTable**:
+- **DropTable**: Permanently deletes metadata and data files from storage
+- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance
data files
+
+
+## Deployment
+
+### Running with Gravitino Server
+
+To enable the Lance REST service within Gravitino server, configure the
following properties in your Gravitino configuration file:
+
+| Configuration Property | Description
| Default Value |
Required | Since Version |
+|-------------------------------------------|------------------------------------------------------------------------------|-------------------------|----------|---------------|
+| `gravitino.auxService.names` | Auxiliary services to run.
Include `lance-rest` to enable Lance REST service | iceberg-rest,lance-rest |
Yes | 0.2.0 |
+| `gravitino.lance-rest.classpath` | Classpath for Lance REST
service, relative to Gravitino home directory | lance-rest-server/libs |
Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Port number for Lance REST
service | 9101 |
Yes | 1.1.0 |
+| `gravitino.lance-rest.host` | Hostname for Lance REST service
| 0.0.0.0 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend
(currently only `gravitino` is supported) | gravitino |
Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI (required
when namespace-backend is `gravitino`) | http://localhost:8090 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name
(required when namespace-backend is `gravitino`) | (none)
| Yes | 1.1.0 |
+
+**Example Configuration:**
+
+```properties
+gravitino.auxService.names = lance-rest
+gravitino.lance-rest.httpPort = 9101
+gravitino.lance-rest.host = 0.0.0.0
+gravitino.lance-rest.namespace-backend = gravitino
+gravitino.lance-rest.gravitino-uri = http://localhost:8090
+gravitino.lance-rest.gravitino-metalake = my_metalake
+```
+
+### Running Standalone
+
+To run Lance REST service independently without Gravitino server:
+
+```shell
+{GRAVITINO_HOME}/bin/gravitino-lance-rest-server.sh start
+```
+
+Configure the service by editing
`{GRAVITINO_HOME}/conf/gravitino-lance-rest-server.conf` or passing
command-line arguments:
+
+| Configuration Property | Description |
Default Value | Required | Since Version |
+|-------------------------------------------|-----------------------------|-----------------------|----------|---------------|
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend |
gravitino | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI |
http://localhost:8090 | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name |
(none) | Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Service port number |
9101 | No | 1.1.0 |
+| `gravitino.lance-rest.host` | Service hostname |
0.0.0.0 | No | 1.1.0 |
+
+:::tip
+In most cases, you only need to configure
`gravitino.lance-rest.gravitino-metalake` and other properties can use their
default values.
+:::
+
+
+### Running with Docker
+
+Launch Lance REST service using Docker:
+
+```shell
+docker run -d --name lance-rest-service -p 9101:9101 \
+ -e LANCE_REST_GRAVITINO_URI=http://gravitino-host:8090 \
Review Comment:
user needs to set up the Gravitino server first?
##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Lance as the underlying table format.
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | Not support now |
+| Create | ✅ Full |
+| Register | ✅ Full |
+| Drop | ✅ Full |
+| Purge | ✅ Full |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+ :::
+
+### Data Type Mappings
+
+Lance uses Apache Arrow for table schemas. The following table shows type
mappings between Gravitino and Arrow:
+
+| Gravitino Type | Arrow Type |
+|----------------------------------|-----------------------------------------|
+| `Struct` | `Struct` |
+| `Map` | `Map` |
+| `List` | `Array` |
+| `Boolean` | `Boolean` |
+| `Byte` | `Int8` |
+| `Short` | `Int16` |
+| `Integer` | `Int32` |
+| `Long` | `Int64` |
+| `Float` | `Float` |
+| `Double` | `Double` |
+| `String` | `Utf8` |
+| `Binary` | `Binary` |
+| `Decimal(p, s)` | `Decimal(p, s)` (128-bit) |
+| `Date` | `Date` |
+| `Timestamp`/`Timestamp(6)` | `TimestampType withoutZone` |
+| `Timestamp(0)` | `TimestampType Second withoutZone` |
+| `Timestamp(3)` | `TimestampType Millisecond withoutZone` |
+| `Timestamp(9)` | `TimestampType Nanosecond withoutZone` |
+| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc` |
+| `Timestamp_tz(0)` | `TimestampType Second withUtc` |
+| `Timestamp_tz(3)` | `TimestampType Millisecond withUtc` |
+| `Timestamp_tz(9)` | `TimestampType Nanosecond withUtc` |
+| `Time`/`Time(9)` | `Time Nanosecond` |
+| `Null` | `Null` |
+| `Fixed(n)` | `Fixed-Size Binary(n)` |
+| `Interval_year` | `Interval(YearMonth)` |
+| `Interval_day` | `Duration(Microsecond)` |
+| `External(arrow_field_json_str)` | Any Arrow Field |
+
+### External Type Support
+
+For Arrow types not natively mapped in Gravitino, use the
`External(arrow_field_json_str)` type, which accepts a JSON string
representation of an Arrow `Field`.
+
+**Requirements:**
+- JSON must conform to Apache Arrow [Field
specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68)
+- `name` attribute must match column name exactly
+- `nullable` attribute must match column nullability
+- `children` array:
+ - Empty for primitive types
+ - Contains child field definitions for complex types (Struct, List)
+
+**Examples:**
+
+| Arrow Type | External Type Definition
|
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Large Utf8` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")`
|
+| `Large Binary` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")`
|
+| `Large List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+| `Fixed-Size List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+
+### Table Properties
+
+Required and optional properties for tables in a Generic Lakehouse Catalog:
+
+| Property | Description
| Default | Required | Since
Version |
+|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|
+| `format` | Table format: `lance`, currently only `lance` is
fully supported.
| (none) | Yes |
1.1.0 |
+| `location` | Storage path for table metadata and data, Lance
currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store.
| (none) | Conditional* |
1.1.0 |
+| `external` | Whether the data directory is an external location.
If it's `true`, dropping a table will only remove metadata in Gravitino and
will not delete the data directory, and purge table will delete both. For a
non-external table, dropping will drop both.
| false | No |
1.1.0 |
+| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`,
`EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for
registering tables
| `CREATE` | No
| 1.1.0 |
+| `lance.register` | Whether it is a register table operation. This API
will not create data directory actually and it's the user's responsibility to
create and manage the data directory.
| false | No |
1.1.0 |
+| `lance.storage.xxxx` | Any additional storage-specific properties required
by Lance format (e.g., S3 credentials, HDFS configs). Replace `xxxx` with
actual property names. For example, we can use
`lance.storage.aws_access_key_id` to set S3 aws_access_key_id when using a S3
location, for detail, please refer to
https://lancedb.com/docs/storage/integrations/ | (none) | No |
1.1.0 |
+
+
+**Location Requirement:** Must be specified at catalog, schema, or table
level. See [Location
Resolution](./lakehouse-generic-catalog.md#key-property-location).
+
+You may also set additional properties specific to your lakehouse format or
custom requirements.
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns. See [Table
Operations](./manage-relational-metadata-using-gravitino.md#table-operations)
for comprehensive documentation.
+
+The following sections provide examples and important details for working with
Lance tables.
+
+#### Creating a Lance Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "lance_table",
+ "comment": "Example Lance table",
+ "columns": [
+ {
+ "name": "id",
+ "type": "integer",
+ "comment": "Primary identifier",
+ "nullable": false
+ }
+ ],
+ "properties": {
+ "format": "lance",
+ "location": "/tmp/lance_catalog/schema/lance_table"
+ }
+}'
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Catalog catalog =
gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> tableProperties = ImmutableMap.<String, String>builder()
+ .put("format", "lance")
+ .put("location", "/tmp/lance_catalog/schema/example_table")
+ .build();
+
+tableCatalog.createTable(
+ NameIdentifier.of("schema", "lance_table"),
+ new Column[] {
+ Column.of("id", Types.IntegerType.get(), "Primary identifier",
+ false, true, Literals.integerLiteral(-1))
+ },
+ "Example Lance table",
+ tableProperties,
+ null, // partitions
+ null, // distributions
+ null, // sortOrders
+ null // indexes
+);
+```
+
+</TabItem>
+</Tabs>
+
+#### Registering External Tables
+
+Register existing Lance tables without moving or copying data:
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "register_lance_table",
+ "comment": "Registered existing Lance table",
+ "columns": [],
+ "properties": {
+ "format": "lance",
+ "lance.register": "true",
+ "location": "/tmp/lance_catalog/schema/existing_lance_table"
+ }
+}'
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Catalog catalog =
gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> registerProperties = ImmutableMap.<String, String>builder()
+ .put("format", "lance")
+ .put("lance.register", "true")
+ .put("location", "/tmp/lance_catalog/schema/existing_lance_table")
+ .build();
+
+tableCatalog.createTable(
+ NameIdentifier.of("schema", "register_lance_table"),
+ new Column[] {}, // Schema auto-detected from existing table
+ "Registered existing Lance table",
+ registerProperties,
+ null, null, null, null
+);
+```
+
+</TabItem>
+</Tabs>
+
+:::tip Registration vs Creation
+- **Registration** (`lance.register: true`):
+ - Links to existing Lance dataset or a path placeholder
+ - Schema automatically detected from Lance metadata
+ - Useful for importing existing datasets
+
+- **Creation** (default):
+ - Creates new Lance table from scratch
+ - Requires column schema definition
+ - Initializes new Lance dataset files
+:::
+
+## Advanced Topics
+
+### Troubleshooting
+
+#### Common Issues
+
+**Issue: "Location not specified" error**
+```
+Solution: Ensure at least one level (catalog/schema/table) specifies the
location property
+```
+
+**Issue: Permission denied errors**
+```
+Solution: Check file system permissions and credentials for the storage backend
+```
+
+**Issue: Table not found after registration**
+```
+Solution: Verify the location path points to a valid Lance dataset directory
+```
+
+### Migration Guide
+
+#### Migrating Existing Lance Tables
+
+1. **Inventory**: List all existing Lance table locations
+2. **Create Catalog**: Create Generic Lakehouse catalog pointing to root
location
+3. **Register Tables**: Use register operation for each table
+4. **Verify**: Confirm all tables are accessible through Gravitino
+5. **Update Clients**: Point applications to Gravitino metadata instead of
direct Lance access
+
+**Example Migration Script:**
+
+```python
+import lance_namespace as ln
+
+# Connect to Lance REST service
+ns = ln.connect("rest", {"uri": "http://localhost:9101/lance"})
Review Comment:
Where does the `Lance REST service` come from?
##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Lance as the underlying table format.
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | Not support now |
+| Create | ✅ Full |
+| Register | ✅ Full |
+| Drop | ✅ Full |
+| Purge | ✅ Full |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+ :::
Review Comment:
remove indent
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,394 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://docs.lancedb.com/api-reference/introduction). For
detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/rest/catalog-spec/).
+
+### What is Lance?
+
+[Lance](https://lance.org/format/) is a modern columnar data format designed
for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
+│ Service │ (Metadata Backend)
+└────────┬────────┘
+ │ File System Operations
+ ▼
+┌─────────────────┐
+│ Lance Datasets │
+│ (S3/GCS/Local) │
+└─────────────────┘
+```
+
+**Key Features:**
+- Full compliance with Lance REST API specification
+- Can run standalone or integrated with Gravitino server
+- Support for namespace and table management
+- Index creation and management capabilities (Index operations are not
supported in version 1.1.0)
+- Metadata stored in Gravitino for unified governance
+
+## Supported Operations
+
+The Lance REST service provides comprehensive support for namespace
management, table management, and index operations. The table below lists all
supported operations:
+
+| Operation | Description
| HTTP Method | Endpoint Pattern | Since Version |
+|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------|
+| CreateNamespace | Create a new Lance namespace
| POST | `/lance/v1/namespace/{id}/create` | 1.1.0 |
+| ListNamespaces | List all namespaces under a parent namespace
| GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 |
+| DescribeNamespace | Retrieve detailed information about a specific namespace
| POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 |
+| DropNamespace | Delete a namespace
| POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 |
+| NamespaceExists | Check whether a namespace exists
| POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 |
+| ListTables | List all tables in a namespace
| GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 |
+| CreateTable | Create a new table in a namespace
| POST | `/lance/v1/table/{id}/create` | 1.1.0 |
+| DropTable | Delete a table including both metadata and data
| POST | `/lance/v1/table/{id}/drop` | 1.1.0 |
+| TableExists | Check whether a table exists
| POST | `/lance/v1/table/{id}/exists` | 1.1.0 |
+| RegisterTable | Register an existing Lance table to a namespace
| POST | `/lance/v1/table/{id}/register` | 1.1.0 |
+| DeregisterTable | Unregister a table from a namespace (metadata only, data
remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 |
+
+More details, please refer to the [Lance REST API
specification](https://lance.org/format/namespace/rest/catalog-spec/)
+
+### Operation Details
+
+Some operations have specific behaviors and modes. Below are important details
to consider:
+
+#### Namespace Operations
+
+**CreateNamespace** supports three modes:
+- `create`: Fails if namespace already exists
+- `exist_ok`: Succeeds even if namespace exists
+- `overwrite`: Replaces existing namespace
+
+**DropNamespace** behavior:
+- Recursively deletes all child namespaces and tables
+- Deletes both metadata and Lance data files
+- Operation is irreversible
+
+#### Table Operations
+
+**RegisterTable vs CreateTable**:
+- **RegisterTable**: Links existing Lance datasets into Gravitino catalog
without data movement
+- **CreateTable**: Creates new Lance table with schema and write metadata files
+:::
Review Comment:
info or note ?
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,394 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://docs.lancedb.com/api-reference/introduction). For
detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/rest/catalog-spec/).
+
+### What is Lance?
+
+[Lance](https://lance.org/format/) is a modern columnar data format designed
for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
Review Comment:
What does the direction of this arrow mean?
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,394 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://docs.lancedb.com/api-reference/introduction). For
detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/rest/catalog-spec/).
+
+### What is Lance?
+
+[Lance](https://lance.org/format/) is a modern columnar data format designed
for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
+│ Service │ (Metadata Backend)
+└────────┬────────┘
+ │ File System Operations
+ ▼
+┌─────────────────┐
+│ Lance Datasets │
+│ (S3/GCS/Local) │
+└─────────────────┘
+```
+
+**Key Features:**
+- Full compliance with Lance REST API specification
+- Can run standalone or integrated with Gravitino server
+- Support for namespace and table management
+- Index creation and management capabilities (Index operations are not
supported in version 1.1.0)
+- Metadata stored in Gravitino for unified governance
+
+## Supported Operations
+
+The Lance REST service provides comprehensive support for namespace
management, table management, and index operations. The table below lists all
supported operations:
+
+| Operation | Description
| HTTP Method | Endpoint Pattern | Since Version |
+|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------|
+| CreateNamespace | Create a new Lance namespace
| POST | `/lance/v1/namespace/{id}/create` | 1.1.0 |
+| ListNamespaces | List all namespaces under a parent namespace
| GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 |
+| DescribeNamespace | Retrieve detailed information about a specific namespace
| POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 |
+| DropNamespace | Delete a namespace
| POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 |
+| NamespaceExists | Check whether a namespace exists
| POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 |
+| ListTables | List all tables in a namespace
| GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 |
+| CreateTable | Create a new table in a namespace
| POST | `/lance/v1/table/{id}/create` | 1.1.0 |
+| DropTable | Delete a table including both metadata and data
| POST | `/lance/v1/table/{id}/drop` | 1.1.0 |
+| TableExists | Check whether a table exists
| POST | `/lance/v1/table/{id}/exists` | 1.1.0 |
+| RegisterTable | Register an existing Lance table to a namespace
| POST | `/lance/v1/table/{id}/register` | 1.1.0 |
+| DeregisterTable | Unregister a table from a namespace (metadata only, data
remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 |
+
+More details, please refer to the [Lance REST API
specification](https://lance.org/format/namespace/rest/catalog-spec/)
+
+### Operation Details
+
+Some operations have specific behaviors and modes. Below are important details
to consider:
+
+#### Namespace Operations
+
+**CreateNamespace** supports three modes:
+- `create`: Fails if namespace already exists
+- `exist_ok`: Succeeds even if namespace exists
+- `overwrite`: Replaces existing namespace
+
+**DropNamespace** behavior:
+- Recursively deletes all child namespaces and tables
+- Deletes both metadata and Lance data files
+- Operation is irreversible
+
+#### Table Operations
+
+**RegisterTable vs CreateTable**:
+- **RegisterTable**: Links existing Lance datasets into Gravitino catalog
without data movement
+- **CreateTable**: Creates new Lance table with schema and write metadata files
+:::
+The `version` field of `CreateTable` response is always null, which stands for
the latest version.
+:::
+
+**DropTable vs DeregisterTable**:
+- **DropTable**: Permanently deletes metadata and data files from storage
+- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance
data files
+
+
+## Deployment
+
+### Running with Gravitino Server
+
+To enable the Lance REST service within Gravitino server, configure the
following properties in your Gravitino configuration file:
Review Comment:
it's better to state the specific file name
##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Lance as the underlying table format.
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | Not support now |
+| Create | ✅ Full |
+| Register | ✅ Full |
+| Drop | ✅ Full |
+| Purge | ✅ Full |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+ :::
+
+### Data Type Mappings
+
+Lance uses Apache Arrow for table schemas. The following table shows type
mappings between Gravitino and Arrow:
+
+| Gravitino Type | Arrow Type |
+|----------------------------------|-----------------------------------------|
+| `Struct` | `Struct` |
+| `Map` | `Map` |
+| `List` | `Array` |
+| `Boolean` | `Boolean` |
+| `Byte` | `Int8` |
+| `Short` | `Int16` |
+| `Integer` | `Int32` |
+| `Long` | `Int64` |
+| `Float` | `Float` |
+| `Double` | `Double` |
+| `String` | `Utf8` |
+| `Binary` | `Binary` |
+| `Decimal(p, s)` | `Decimal(p, s)` (128-bit) |
+| `Date` | `Date` |
+| `Timestamp`/`Timestamp(6)` | `TimestampType withoutZone` |
+| `Timestamp(0)` | `TimestampType Second withoutZone` |
+| `Timestamp(3)` | `TimestampType Millisecond withoutZone` |
+| `Timestamp(9)` | `TimestampType Nanosecond withoutZone` |
+| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc` |
+| `Timestamp_tz(0)` | `TimestampType Second withUtc` |
+| `Timestamp_tz(3)` | `TimestampType Millisecond withUtc` |
+| `Timestamp_tz(9)` | `TimestampType Nanosecond withUtc` |
+| `Time`/`Time(9)` | `Time Nanosecond` |
+| `Null` | `Null` |
+| `Fixed(n)` | `Fixed-Size Binary(n)` |
+| `Interval_year` | `Interval(YearMonth)` |
+| `Interval_day` | `Duration(Microsecond)` |
+| `External(arrow_field_json_str)` | Any Arrow Field |
+
+### External Type Support
+
+For Arrow types not natively mapped in Gravitino, use the
`External(arrow_field_json_str)` type, which accepts a JSON string
representation of an Arrow `Field`.
+
+**Requirements:**
+- JSON must conform to Apache Arrow [Field
specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68)
+- `name` attribute must match column name exactly
+- `nullable` attribute must match column nullability
+- `children` array:
+ - Empty for primitive types
+ - Contains child field definitions for complex types (Struct, List)
+
+**Examples:**
+
+| Arrow Type | External Type Definition
|
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Large Utf8` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")`
|
+| `Large Binary` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")`
|
+| `Large List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+| `Fixed-Size List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+
+### Table Properties
+
+Required and optional properties for tables in a Generic Lakehouse Catalog:
+
+| Property | Description
| Default | Required | Since
Version |
+|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|
+| `format` | Table format: `lance`, currently only `lance` is
fully supported.
| (none) | Yes |
1.1.0 |
+| `location` | Storage path for table metadata and data, Lance
currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store.
| (none) | Conditional* |
1.1.0 |
+| `external` | Whether the data directory is an external location.
If it's `true`, dropping a table will only remove metadata in Gravitino and
will not delete the data directory, and purge table will delete both. For a
non-external table, dropping will drop both.
| false | No |
1.1.0 |
+| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`,
`EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for
registering tables
| `CREATE` | No
| 1.1.0 |
+| `lance.register` | Whether it is a register table operation. This API
will not create data directory actually and it's the user's responsibility to
create and manage the data directory.
| false | No |
1.1.0 |
+| `lance.storage.xxxx` | Any additional storage-specific properties required
by Lance format (e.g., S3 credentials, HDFS configs). Replace `xxxx` with
actual property names. For example, we can use
`lance.storage.aws_access_key_id` to set S3 aws_access_key_id when using a S3
location, for detail, please refer to
https://lancedb.com/docs/storage/integrations/ | (none) | No |
1.1.0 |
+
+
+**Location Requirement:** Must be specified at catalog, schema, or table
level. See [Location
Resolution](./lakehouse-generic-catalog.md#key-property-location).
+
+You may also set additional properties specific to your lakehouse format or
custom requirements.
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns. See [Table
Operations](./manage-relational-metadata-using-gravitino.md#table-operations)
for comprehensive documentation.
+
+The following sections provide examples and important details for working with
Lance tables.
+
+#### Creating a Lance Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" -d '{
+ "name": "lance_table",
+ "comment": "Example Lance table",
+ "columns": [
+ {
+ "name": "id",
+ "type": "integer",
+ "comment": "Primary identifier",
+ "nullable": false
+ }
+ ],
+ "properties": {
+ "format": "lance",
+ "location": "/tmp/lance_catalog/schema/lance_table"
+ }
+}'
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Catalog catalog =
gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> tableProperties = ImmutableMap.<String, String>builder()
+ .put("format", "lance")
+ .put("location", "/tmp/lance_catalog/schema/example_table")
+ .build();
+
+tableCatalog.createTable(
+ NameIdentifier.of("schema", "lance_table"),
+ new Column[] {
+ Column.of("id", Types.IntegerType.get(), "Primary identifier",
+ false, true, Literals.integerLiteral(-1))
Review Comment:
Has Lance supported column default values?
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,394 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://docs.lancedb.com/api-reference/introduction). For
detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/rest/catalog-spec/).
+
+### What is Lance?
+
+[Lance](https://lance.org/format/) is a modern columnar data format designed
for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
+│ Service │ (Metadata Backend)
+└────────┬────────┘
+ │ File System Operations
+ ▼
+┌─────────────────┐
+│ Lance Datasets │
+│ (S3/GCS/Local) │
+└─────────────────┘
+```
+
+**Key Features:**
+- Full compliance with Lance REST API specification
+- Can run standalone or integrated with Gravitino server
+- Support for namespace and table management
+- Index creation and management capabilities (Index operations are not
supported in version 1.1.0)
+- Metadata stored in Gravitino for unified governance
+
+## Supported Operations
+
+The Lance REST service provides comprehensive support for namespace
management, table management, and index operations. The table below lists all
supported operations:
+
+| Operation | Description
| HTTP Method | Endpoint Pattern | Since Version |
+|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------|
+| CreateNamespace | Create a new Lance namespace
| POST | `/lance/v1/namespace/{id}/create` | 1.1.0 |
+| ListNamespaces | List all namespaces under a parent namespace
| GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 |
+| DescribeNamespace | Retrieve detailed information about a specific namespace
| POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 |
+| DropNamespace | Delete a namespace
| POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 |
+| NamespaceExists | Check whether a namespace exists
| POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 |
+| ListTables | List all tables in a namespace
| GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 |
+| CreateTable | Create a new table in a namespace
| POST | `/lance/v1/table/{id}/create` | 1.1.0 |
+| DropTable | Delete a table including both metadata and data
| POST | `/lance/v1/table/{id}/drop` | 1.1.0 |
+| TableExists | Check whether a table exists
| POST | `/lance/v1/table/{id}/exists` | 1.1.0 |
+| RegisterTable | Register an existing Lance table to a namespace
| POST | `/lance/v1/table/{id}/register` | 1.1.0 |
+| DeregisterTable | Unregister a table from a namespace (metadata only, data
remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 |
+
+More details, please refer to the [Lance REST API
specification](https://lance.org/format/namespace/rest/catalog-spec/)
+
+### Operation Details
+
+Some operations have specific behaviors and modes. Below are important details
to consider:
+
+#### Namespace Operations
+
+**CreateNamespace** supports three modes:
+- `create`: Fails if namespace already exists
+- `exist_ok`: Succeeds even if namespace exists
+- `overwrite`: Replaces existing namespace
+
+**DropNamespace** behavior:
+- Recursively deletes all child namespaces and tables
+- Deletes both metadata and Lance data files
+- Operation is irreversible
+
+#### Table Operations
+
+**RegisterTable vs CreateTable**:
+- **RegisterTable**: Links existing Lance datasets into Gravitino catalog
without data movement
+- **CreateTable**: Creates new Lance table with schema and write metadata files
+:::
+The `version` field of `CreateTable` response is always null, which stands for
the latest version.
+:::
+
+**DropTable vs DeregisterTable**:
+- **DropTable**: Permanently deletes metadata and data files from storage
+- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance
data files
+
+
+## Deployment
+
+### Running with Gravitino Server
+
+To enable the Lance REST service within Gravitino server, configure the
following properties in your Gravitino configuration file:
+
+| Configuration Property | Description
| Default Value |
Required | Since Version |
+|-------------------------------------------|------------------------------------------------------------------------------|-------------------------|----------|---------------|
+| `gravitino.auxService.names` | Auxiliary services to run.
Include `lance-rest` to enable Lance REST service | iceberg-rest,lance-rest |
Yes | 0.2.0 |
+| `gravitino.lance-rest.classpath` | Classpath for Lance REST
service, relative to Gravitino home directory | lance-rest-server/libs |
Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Port number for Lance REST
service | 9101 |
Yes | 1.1.0 |
+| `gravitino.lance-rest.host` | Hostname for Lance REST service
| 0.0.0.0 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend
(currently only `gravitino` is supported) | gravitino |
Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI (required
when namespace-backend is `gravitino`) | http://localhost:8090 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name
(required when namespace-backend is `gravitino`) | (none)
| Yes | 1.1.0 |
+
+**Example Configuration:**
+
+```properties
+gravitino.auxService.names = lance-rest
+gravitino.lance-rest.httpPort = 9101
+gravitino.lance-rest.host = 0.0.0.0
+gravitino.lance-rest.namespace-backend = gravitino
+gravitino.lance-rest.gravitino-uri = http://localhost:8090
+gravitino.lance-rest.gravitino-metalake = my_metalake
+```
+
+### Running Standalone
+
+To run Lance REST service independently without Gravitino server:
+
+```shell
+{GRAVITINO_HOME}/bin/gravitino-lance-rest-server.sh start
+```
+
+Configure the service by editing
`{GRAVITINO_HOME}/conf/gravitino-lance-rest-server.conf` or passing
command-line arguments:
+
+| Configuration Property | Description |
Default Value | Required | Since Version |
+|-------------------------------------------|-----------------------------|-----------------------|----------|---------------|
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend |
gravitino | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI |
http://localhost:8090 | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name |
(none) | Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Service port number |
9101 | No | 1.1.0 |
+| `gravitino.lance-rest.host` | Service hostname |
0.0.0.0 | No | 1.1.0 |
Review Comment:
they are optional for the standalone mode but required in embedded mode?
##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic
lakehouse catalog using Lance as the underlying table format.
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table
summarizes supported operations:
+
+| Operation | Support Status |
+|-----------|-----------------|
+| List | ✅ Full |
+| Load | ✅ Full |
+| Alter | Not support now |
+| Create | ✅ Full |
+| Register | ✅ Full |
+| Drop | ✅ Full |
+| Purge | ✅ Full |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+ :::
+
+### Data Type Mappings
+
+Lance uses Apache Arrow for table schemas. The following table shows type
mappings between Gravitino and Arrow:
+
+| Gravitino Type | Arrow Type |
+|----------------------------------|-----------------------------------------|
+| `Struct` | `Struct` |
+| `Map` | `Map` |
+| `List` | `Array` |
+| `Boolean` | `Boolean` |
+| `Byte` | `Int8` |
+| `Short` | `Int16` |
+| `Integer` | `Int32` |
+| `Long` | `Int64` |
+| `Float` | `Float` |
+| `Double` | `Double` |
+| `String` | `Utf8` |
+| `Binary` | `Binary` |
+| `Decimal(p, s)` | `Decimal(p, s)` (128-bit) |
+| `Date` | `Date` |
+| `Timestamp`/`Timestamp(6)` | `TimestampType withoutZone` |
+| `Timestamp(0)` | `TimestampType Second withoutZone` |
+| `Timestamp(3)` | `TimestampType Millisecond withoutZone` |
+| `Timestamp(9)` | `TimestampType Nanosecond withoutZone` |
+| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc` |
+| `Timestamp_tz(0)` | `TimestampType Second withUtc` |
+| `Timestamp_tz(3)` | `TimestampType Millisecond withUtc` |
+| `Timestamp_tz(9)` | `TimestampType Nanosecond withUtc` |
+| `Time`/`Time(9)` | `Time Nanosecond` |
+| `Null` | `Null` |
+| `Fixed(n)` | `Fixed-Size Binary(n)` |
+| `Interval_year` | `Interval(YearMonth)` |
+| `Interval_day` | `Duration(Microsecond)` |
+| `External(arrow_field_json_str)` | Any Arrow Field |
+
+### External Type Support
+
+For Arrow types not natively mapped in Gravitino, use the
`External(arrow_field_json_str)` type, which accepts a JSON string
representation of an Arrow `Field`.
+
+**Requirements:**
+- JSON must conform to Apache Arrow [Field
specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68)
+- `name` attribute must match column name exactly
+- `nullable` attribute must match column nullability
+- `children` array:
+ - Empty for primitive types
+ - Contains child field definitions for complex types (Struct, List)
+
+**Examples:**
+
+| Arrow Type | External Type Definition
|
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Large Utf8` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")`
|
+| `Large Binary` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")`
|
+| `Large List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+| `Fixed-Size List` |
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
|
+
+### Table Properties
+
+Required and optional properties for tables in a Generic Lakehouse Catalog:
+
+| Property | Description
| Default | Required | Since
Version |
+|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|
+| `format` | Table format: `lance`, currently only `lance` is
fully supported.
| (none) | Yes |
1.1.0 |
+| `location` | Storage path for table metadata and data, Lance
currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store.
| (none) | Conditional* |
1.1.0 |
+| `external` | Whether the data directory is an external location.
If it's `true`, dropping a table will only remove metadata in Gravitino and
will not delete the data directory, and purge table will delete both. For a
non-external table, dropping will drop both.
| false | No |
1.1.0 |
+| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`,
`EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for
registering tables
| `CREATE` | No
| 1.1.0 |
+| `lance.register` | Whether it is a register table operation. This API
will not create data directory actually and it's the user's responsibility to
create and manage the data directory.
| false | No |
1.1.0 |
Review Comment:
> This API will not create data directory actually and it's the user's
responsibility to create and manage the data directory.
You should describe what value caused this behavior to occur (true or false).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]