mchades commented on code in PR #9173: URL: https://github.com/apache/gravitino/pull/9173#discussion_r2608892760
########## docs/lance-rest-service.md: ########## @@ -0,0 +1,394 @@ +--- +title: "Lance REST service" +slug: /lance-rest-service +keywords: + - Lance REST + - Lance datasets + - REST API +license: "This software is licensed under the Apache License version 2." +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Overview + +The Lance REST service provides a RESTful interface for managing Lance datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this service enables seamless interaction with Lance datasets for data operations and metadata management. + +The service implements the [Lance REST API specification](https://docs.lancedb.com/api-reference/introduction). For detailed specification documentation, see the [official Lance REST documentation](https://lance.org/format/namespace/rest/catalog-spec/). + +### What is Lance? + +[Lance](https://lance.org/format/) is a modern columnar data format designed for AI/ML workloads. It provides: + +- **High-performance vector search**: Native support for similarity search on high-dimensional embeddings +- **Columnar storage**: Optimized for analytical queries and machine learning pipelines +- **Fast random access**: Efficient row-level operations unlike traditional columnar formats +- **Version control**: Built-in dataset versioning and time-travel capabilities +- **Incremental updates**: Append and update data without full rewrites + +### Architecture + +The Lance REST service acts as a bridge between Lance datasets and applications: + +``` +┌─────────────────┐ +│ Applications │ +│ (Python/Java) │ +└────────┬────────┘ + │ HTTP/REST + ▼ +┌─────────────────┐ +│ Lance REST │◄──── Gravitino Metalake +│ Service │ (Metadata Backend) +└────────┬────────┘ + │ File System Operations + ▼ +┌─────────────────┐ +│ Lance Datasets │ +│ (S3/GCS/Local) │ +└─────────────────┘ +``` + +**Key Features:** +- Full compliance with Lance REST API specification +- Can run standalone or integrated with Gravitino server +- Support for namespace and table management +- Index creation and management capabilities (Index operations are not supported in version 1.1.0) +- Metadata stored in Gravitino for unified governance + +## Supported Operations + +The Lance REST service provides comprehensive support for namespace management, table management, and index operations. The table below lists all supported operations: + +| Operation | Description | HTTP Method | Endpoint Pattern | Since Version | +|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------| +| CreateNamespace | Create a new Lance namespace | POST | `/lance/v1/namespace/{id}/create` | 1.1.0 | +| ListNamespaces | List all namespaces under a parent namespace | GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 | +| DescribeNamespace | Retrieve detailed information about a specific namespace | POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 | +| DropNamespace | Delete a namespace | POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 | +| NamespaceExists | Check whether a namespace exists | POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 | +| ListTables | List all tables in a namespace | GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 | +| CreateTable | Create a new table in a namespace | POST | `/lance/v1/table/{id}/create` | 1.1.0 | +| DropTable | Delete a table including both metadata and data | POST | `/lance/v1/table/{id}/drop` | 1.1.0 | +| TableExists | Check whether a table exists | POST | `/lance/v1/table/{id}/exists` | 1.1.0 | +| RegisterTable | Register an existing Lance table to a namespace | POST | `/lance/v1/table/{id}/register` | 1.1.0 | +| DeregisterTable | Unregister a table from a namespace (metadata only, data remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 | + +More details, please refer to the [Lance REST API specification](https://lance.org/format/namespace/rest/catalog-spec/) + +### Operation Details + +Some operations have specific behaviors and modes. Below are important details to consider: + +#### Namespace Operations + +**CreateNamespace** supports three modes: +- `create`: Fails if namespace already exists +- `exist_ok`: Succeeds even if namespace exists +- `overwrite`: Replaces existing namespace + +**DropNamespace** behavior: +- Recursively deletes all child namespaces and tables +- Deletes both metadata and Lance data files +- Operation is irreversible + +#### Table Operations + +**RegisterTable vs CreateTable**: +- **RegisterTable**: Links existing Lance datasets into Gravitino catalog without data movement +- **CreateTable**: Creates new Lance table with schema and write metadata files +::: +The `version` field of `CreateTable` response is always null, which stands for the latest version. +::: + +**DropTable vs DeregisterTable**: +- **DropTable**: Permanently deletes metadata and data files from storage +- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance data files + + +## Deployment + +### Running with Gravitino Server + +To enable the Lance REST service within Gravitino server, configure the following properties in your Gravitino configuration file: + +| Configuration Property | Description | Default Value | Required | Since Version | +|-------------------------------------------|------------------------------------------------------------------------------|-------------------------|----------|---------------| +| `gravitino.auxService.names` | Auxiliary services to run. Include `lance-rest` to enable Lance REST service | iceberg-rest,lance-rest | Yes | 0.2.0 | +| `gravitino.lance-rest.classpath` | Classpath for Lance REST service, relative to Gravitino home directory | lance-rest-server/libs | Yes | 1.1.0 | +| `gravitino.lance-rest.httpPort` | Port number for Lance REST service | 9101 | Yes | 1.1.0 | +| `gravitino.lance-rest.host` | Hostname for Lance REST service | 0.0.0.0 | Yes | 1.1.0 | +| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend (currently only `gravitino` is supported) | gravitino | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI (required when namespace-backend is `gravitino`) | http://localhost:8090 | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name (required when namespace-backend is `gravitino`) | (none) | Yes | 1.1.0 | + +**Example Configuration:** + +```properties +gravitino.auxService.names = lance-rest +gravitino.lance-rest.httpPort = 9101 +gravitino.lance-rest.host = 0.0.0.0 +gravitino.lance-rest.namespace-backend = gravitino +gravitino.lance-rest.gravitino-uri = http://localhost:8090 +gravitino.lance-rest.gravitino-metalake = my_metalake +``` + +### Running Standalone + +To run Lance REST service independently without Gravitino server: + +```shell +{GRAVITINO_HOME}/bin/gravitino-lance-rest-server.sh start +``` + +Configure the service by editing `{GRAVITINO_HOME}/conf/gravitino-lance-rest-server.conf` or passing command-line arguments: + +| Configuration Property | Description | Default Value | Required | Since Version | +|-------------------------------------------|-----------------------------|-----------------------|----------|---------------| +| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend | gravitino | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI | http://localhost:8090 | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name | (none) | Yes | 1.1.0 | +| `gravitino.lance-rest.httpPort` | Service port number | 9101 | No | 1.1.0 | +| `gravitino.lance-rest.host` | Service hostname | 0.0.0.0 | No | 1.1.0 | + +:::tip +In most cases, you only need to configure `gravitino.lance-rest.gravitino-metalake` and other properties can use their default values. +::: + + +### Running with Docker + +Launch Lance REST service using Docker: + +```shell +docker run -d --name lance-rest-service -p 9101:9101 \ + -e LANCE_REST_GRAVITINO_URI=http://gravitino-host:8090 \ Review Comment: Adding comments will make it clearer, because LRS also has embedded mode, and using embedded mode can be faster for users to set up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
