This is an automated email from the ASF dual-hosted git repository. yuqi4733 pushed a commit to branch internal-main in repository https://gitbox.apache.org/repos/asf/gravitino.git
commit e64ef7450340ac8927ce2d1773066eae8a8267e6 Author: Mini Yu <[email protected]> AuthorDate: Thu Dec 11 15:23:49 2025 +0800 [#9169] improvement(docs): Add docs about Lance REST service and Generic Lakehouse catalog (#9173) ### What changes were proposed in this pull request? Add docs about Lance REST service. ### Why are the changes needed? Let users deploy the Lance REST service easily. Fix: #9169 Fix: #9405 ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? N/A. --- conf/gravitino.conf.template | 2 +- docs/docker-image-details.md | 4 +- docs/index.md | 6 + docs/lakehouse-generic-catalog.md | 202 +++++++++++ docs/lakehouse-generic-lance-table.md | 304 ++++++++++++++++ docs/lance-rest-service.md | 401 +++++++++++++++++++++ docs/manage-relational-metadata-using-gravitino.md | 50 +-- 7 files changed, 944 insertions(+), 25 deletions(-) diff --git a/conf/gravitino.conf.template b/conf/gravitino.conf.template index 7867576a0b..029e1d2121 100644 --- a/conf/gravitino.conf.template +++ b/conf/gravitino.conf.template @@ -115,4 +115,4 @@ gravitino.lance-rest.namespace-backend = gravitino # The uri of the Lance REST service gravitino namespace backend gravitino.lance-rest.gravitino-uri = http://localhost:8090 # The metalake name used for Lance REST service gravitino namespace backend, please create the metalake first before using it, and configure the metalake name here. -# gravitino.lance-rest.gravitino.metalake = metalake +# gravitino.lance-rest.gravitino-metalake = metalake diff --git a/docs/docker-image-details.md b/docs/docker-image-details.md index a98f845a14..0976325fa6 100644 --- a/docs/docker-image-details.md +++ b/docs/docker-image-details.md @@ -149,7 +149,7 @@ You can deploy the standalone Gravitino Lance REST server with the Docker image. ```shell -docker run --rm -d -p 9102:9102 -e LANCE_REST_GRAVITINO_METALAKE_NAME=test -e LANCE_REST_PORT=9102 apache/gravitino-lance-rest:latest +docker run --rm -d -p 9102:9102 -e LANCE_REST_GRAVITINO_METALAKE_NAME=test -e LANCE_REST_GRAVITINO_URI=http://gravitino-host:port -e LANCE_REST_PORT=9102 apache/gravitino-lance-rest:latest ``` Memory settings @@ -159,7 +159,7 @@ Use `GRAVITINO_MEM` to size the JVM (default `-Xms1024m -Xmx1024m -XX:MaxMetaspa Currently, Gravitino Lance REST server supports setting the following environment variables - LANCE_REST_GRAVITINO_METALAKE_NAME: It will overwrite the configuration "gravitino.lance-rest.gravitino-metalake" in configuration file `conf/gravitino-lance-rest-server.conf`. **You should set it to your Gravitino metalake name.** - LANCE_REST_NAMESPACE_BACKEND: It will overwrite the configuration "gravitino.lance-rest.namespace-backend" in configuration file `conf/gravitino-lance-rest-server.conf`. The default value is "gravitino" and you should not change it as of now. -- LANCE_REST_GRAVITINO_URI: It will overwrite the configuration "gravitino.lance-rest.gravitino-uri" in configuration file `conf/gravitino-lance-rest-server.conf`. The default value is "http://localhost:8090" and you can change it to your Gravitino server address. +- LANCE_REST_GRAVITINO_URI: It will overwrite the configuration "gravitino.lance-rest.gravitino-uri" in configuration file `conf/gravitino-lance-rest-server.conf`. The default value is "http://localhost:8090" and you can change it to your Gravitino server address. **Be care that Gravitino server URI `http://localhost:8090` is a docker container internal address, if your Gravitino server is running outside the docker container, you should set it to your host IP address like `http://host-i [...] - LANCE_REST_HOST: It will overwrite the configuration "gravitino.lance-rest.host" in configuration file `conf/gravitino-lance-rest-server.conf`. The default value is `0.0.0.0`. - LANCE_REST_PORT: It will overwrite the configuration "gravitino.lance-rest.httpPort" in configuration file `conf/gravitino-lance-rest-server.conf`. The default value is `9101`. diff --git a/docs/index.md b/docs/index.md index 8283ae089c..0871384270 100644 --- a/docs/index.md +++ b/docs/index.md @@ -85,6 +85,7 @@ Gravitino currently supports the following catalogs: * [**PostgreSQL catalog**](./jdbc-postgresql-catalog.md) * [**OceanBase catalog**](./jdbc-oceanbase-catalog.md) * [**StarRocks catalog**](./jdbc-starrocks-catalog.md) +* [**Lakehouse generic catalog**](./lakehouse-generic-catalog.md) If you want to operate table and partition statistics, you can refer to the [document](./manage-statistics-in-gravitino.md). @@ -134,6 +135,7 @@ Gravitino supports different catalogs to manage the metadata in different source * [Paimon catalog](./lakehouse-paimon-catalog.md): a complete guide to using Gravitino to manage Apache Paimon data. * [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data. * [OceanBase catalog](./jdbc-oceanbase-catalog.md): a complete guide to using Gravitino to manage OceanBase data. +* [Lakehouse generic catalog](./lakehouse-generic-catalog.md): a complete guide to using Gravitino to manage lakehouse data sources. ### Governance @@ -151,6 +153,10 @@ Gravitino provides governance features to manage metadata in a unified way. See: * [Iceberg REST catalog service](./iceberg-rest-service.md): a guide to using Gravitino as an Apache Iceberg REST catalog service. +### Gravitino Lance REST catalog service +* [Lance REST catalog service](./lance-rest-service.md): a guide to using Gravitino + as a Lance REST catalog service. + ### Connectors #### Trino connector diff --git a/docs/lakehouse-generic-catalog.md b/docs/lakehouse-generic-catalog.md new file mode 100644 index 0000000000..3abc474ac9 --- /dev/null +++ b/docs/lakehouse-generic-catalog.md @@ -0,0 +1,202 @@ +--- +title: "Generic Lakehouse Catalog" +slug: /lakehouse-generic-catalog +keywords: + - lakehouse + - lance + - metadata + - generic catalog + - file system +license: "This software is licensed under the Apache License version 2." +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Overview + +The Generic Lakehouse Catalog is a Gravitino catalog implementation designed to seamlessly integrate with lakehouse storage systems built on file system-based architectures. This catalog enables unified metadata management for lakehouse tables stored on various storage backends, providing a consistent interface for data discovery, governance, and access control. + +Currently, Gravitino fully supports the **Lance** lakehouse format, with plans to extend support to additional formats in the future. + +### Why Use Generic Lakehouse Catalog? + +1. **Unified Metadata Management**: Single source of truth for table metadata across multiple storage backends +2. **Multi-Format Support**: Extensible architecture to support various lakehouse table formats such as Lance, Iceberg, Hudi, etc. +3. **Storage Flexibility**: Work with any file system, local, or cloud object stores +4. **Gravitino Integration**: Leverage Gravitino's metadata management, access control, lineage tracking, and data discovery +5. **Easy Migration**: Register existing lakehouse tables without data movement + +## Catalog Management + +### Capabilities + +The Generic Lakehouse Catalog provides comprehensive relational metadata management capabilities equivalent to standard relational catalogs: + +**Supported Operations:** +- ✅ Create, read, update, and delete catalogs +- ✅ List all catalogs in a metalake +- ✅ Manage catalog properties and metadata +- ✅ Set and modify catalog locations +- ✅ Configure storage backend credentials + +For detailed information on available operations, see [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md). + +### Catalog Properties + +| Property | Description | Example | Required | Since Version | +|------------|----------------------------------------------|-------------------------|----------|---------------| +| `provider` | Catalog provider type | `lakehouse-generic` | Yes | 1.1.0 | +| `location` | Root storage path for all schemas and tables | `s3://bucket/lakehouse` | No | 1.1.0 | + +#### Key Property: `location` + +The `location` property specifies the root directory for the lakehouse table. All schemas and tables are stored under this location unless explicitly overridden at the schema or table level. + +**Location Resolution Hierarchy:** +1. Table-level `location` (highest priority) +2. Schema-level `location`, then the location of the table will be `{schema_location}/{table_name}` +3. Catalog-level `location` (fallback), then the location of the table will be `{catalog_location}/{schema_name}/{table_name}` + +**Example Location Hierarchy:** +``` +Case1: only catalog location is set +Catalog location: hdfs://namenode:9000/lakehouse +└── Schema: sales + ├── Table: orders. Final location of table: hdfs://namenode:9000/lakehouse/sales/orders + └── Table: customers. Final location of table: hdfs://namenode:9000/lakehouse/sales/customers + +case2: schema location is set, overriding catalog location and table location is not set +Catalog location: hdfs://namenode:9000/lakehouse +└── Schema: sales: s3://sales-bucket/data + ├── Table: orders. Final location of table: s3://sales-bucket/data/orders + └── Table: customers. Final location of table: s3://sales-bucket/data/customers + +case3: table location is set, overriding both schema and catalog locations +Catalog location: hdfs://namenode:9000/lakehouse +└── Schema: sales: s3://sales-bucket/data + ├── Table: orders. Table location: s3://sales-bucket/my_orders, Final location of table: s3://sales-bucket/my_orders + └── Table: customers. Table location: s3://sales-bucket/my_customers, Final location of table: s3://sales-bucket/my_customers + +``` + +### Creating a Catalog + +Use `provider: "lakehouse-generic"` when creating a generic lakehouse catalog. + +<Tabs groupId='language' queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" -d '{ + "name": "generic_lakehouse_catalog", + "type": "RELATIONAL", + "comment": "Generic lakehouse catalog for Lance datasets", + "provider": "lakehouse-generic", + "properties": { + "location": "hdfs://localhost:9000/user/lakehouse" + } +}' http://localhost:8090/api/metalakes/metalake/catalogs +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +GravitinoClient gravitinoClient = GravitinoClient + .builder("http://127.0.0.1:8090") + .withMetalake("metalake") + .build(); + +Map<String, String> catalogProperties = ImmutableMap.<String, String>builder() + .put("location", "hdfs://localhost:9000/user/lakehouse") + .build(); + +Catalog catalog = gravitinoClient.createCatalog( + "generic_lakehouse_catalog", + Type.RELATIONAL, + "lakehouse-generic", + "Generic lakehouse catalog for Lance datasets", + catalogProperties +); +``` + +</TabItem> +</Tabs> + +Other catalog operations are general with relational catalogs. See [Catalog Operations](./manage-relational-metadata-using-gravitino.md#catalog-operations) for detailed documentation. + +## Schema Management + +### Capabilities + +Schema operations follow the same patterns as relational catalogs: + +**Supported Operations:** +- ✅ Create schemas with custom properties +- ✅ List all schemas in a catalog +- ✅ Load schema metadata and properties +- ✅ Update schema properties +- ✅ Delete schemas +- ✅ Check schema existence + +See [Schema Operations](./manage-relational-metadata-using-gravitino.md#schema-operations) for detailed documentation. + +### Schema Properties + +Schemas inherit catalog properties and can override specific settings: + +| Property | Description | Example | Required | Since version | +|------------|----------------------------------------------------------|------------------------------|----------|---------------| +| `location` | Custom storage root path for all tables under the schema | 's3://bucket/path_to_schema' | No | 1.1.0 | + +About location resolution hierarchy, please see [Key Property: `location`](#key-property-location) in the Catalog Management section for more details. + +### Schema Operations + +**Creating a Schema:** + +<Tabs groupId='language' queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" -d '{ + "name": "sales", + "comment": "Sales department data", + "properties": { + "location": "s3://sales-bucket/data", + "owner": "sales-team" + } +}' http://localhost:8090/api/metalakes/metalake/catalogs/lakehouse_catalog/schemas +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +Map<String, String> schemaProperties = ImmutableMap.<String, String>builder() + .put("location", "s3://sales-bucket/data") + .put("owner", "sales-team") + .build(); + +catalog.asSchemas().createSchema( + "sales", + "Sales department data", + schemaProperties +); +``` + +</TabItem> +</Tabs> + +For additional operations, refer to [Schema Operations documentation](./manage-relational-metadata-using-gravitino.md#schema-operations). + +## Table management + +### Supported Operations + +Since different lakehouse table formats have varying capabilities, table operation support may differ. The following are table operations for different lakehouse formats. + +- [Lance Format Support](./lakehouse-generic-lance-table.md) \ No newline at end of file diff --git a/docs/lakehouse-generic-lance-table.md b/docs/lakehouse-generic-lance-table.md new file mode 100644 index 0000000000..174e5f9a8c --- /dev/null +++ b/docs/lakehouse-generic-lance-table.md @@ -0,0 +1,304 @@ +--- +title: "Lance table support" +slug: /lance-table-support +keywords: +- lakehouse +- lance +- metadata +- generic catalog +license: "This software is licensed under the Apache License version 2." +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + +## Overview + +This document describes how to use Apache Gravitino to manage a generic lakehouse catalog using Lance as the underlying table format. + + +## Table Management + +### Supported Operations + +For Lance tables in a Generic Lakehouse Catalog, the following table summarizes supported operations: + +| Operation | Support Status | +|-----------|-----------------| +| List | ✅ Full | +| Load | ✅ Full | +| Alter | Not support now | +| Create | ✅ Full | +| Register | ✅ Full | +| Drop | ✅ Full | +| Purge | ✅ Full | + +:::note Feature Limitations +- **Partitioning:** Not currently supported +- **Sort Orders:** Not currently supported +- **Distributions:** Not currently supported +- **Indexes:** Not currently supported +::: + +### Data Type Mappings + +Lance uses Apache Arrow for table schemas. The following table shows type mappings between Gravitino and Arrow: + +| Gravitino Type | Arrow Type | +|----------------------------------|-----------------------------------------| +| `Struct` | `Struct` | +| `Map` | `Map` | +| `List` | `Array` | +| `Boolean` | `Boolean` | +| `Byte` | `Int8` | +| `Short` | `Int16` | +| `Integer` | `Int32` | +| `Long` | `Int64` | +| `Float` | `Float` | +| `Double` | `Double` | +| `String` | `Utf8` | +| `Binary` | `Binary` | +| `Decimal(p, s)` | `Decimal(p, s)` (128-bit) | +| `Date` | `Date` | +| `Timestamp`/`Timestamp(6)` | `TimestampType withoutZone` | +| `Timestamp(0)` | `TimestampType Second withoutZone` | +| `Timestamp(3)` | `TimestampType Millisecond withoutZone` | +| `Timestamp(9)` | `TimestampType Nanosecond withoutZone` | +| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc` | +| `Timestamp_tz(0)` | `TimestampType Second withUtc` | +| `Timestamp_tz(3)` | `TimestampType Millisecond withUtc` | +| `Timestamp_tz(9)` | `TimestampType Nanosecond withUtc` | +| `Time`/`Time(9)` | `Time Nanosecond` | +| `Null` | `Null` | +| `Fixed(n)` | `Fixed-Size Binary(n)` | +| `Interval_year` | `Interval(YearMonth)` | +| `Interval_day` | `Duration(Microsecond)` | +| `External(arrow_field_json_str)` | Any Arrow Field | + +### External Type Support + +For Arrow types not natively mapped in Gravitino, use the `External(arrow_field_json_str)` type, which accepts a JSON string representation of an Arrow `Field`. + +**Requirements:** +- JSON must conform to Apache Arrow [Field specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68) +- `name` attribute must match column name exactly +- `nullable` attribute must match column nullability +- `children` array: + - Empty for primitive types + - Contains child field definitions for complex types (Struct, List) + +**Examples:** + +| Arrow Type | External Type Definition | +|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Large Utf8` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")` | +| `Large Binary` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")` | +| `Large List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")` | +| `Fixed-Size List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")` | + +### Table Properties + +Required and optional properties for tables in a Generic Lakehouse Catalog: + +| Property | Description | Default | Required | Since Version | +|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------| +| `format` | Table format: `lance`, currently only `lance` is fully supported. | (none) | Yes | 1.1.0 | +| `location` | Storage path for table metadata and data, Lance currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store. | (none) | Conditional* | 1.1.0 | +| `external` | Whether the data directory is an external location. If it's `true`, dropping a table will only remove metadata in Gravitino and will not delete the data directory, and purge table will delete both. For a non-external table, dropping will drop both. | false | No | 1.1.0 | +| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`, `EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for registering tables | `CREATE` | No | 1.1.0 | +| `lance.register` | Whether it is a register table operation. If it's `true`, This API will not create data directory actually and it's the user's responsibility to create and manage the data directory. `false` it will actually create a table. | false | No | 1.1.0 | +| `lance.storage.xxxx` | Any additional storage-specific properties required by Lance format (e.g., S3 credentials, HDFS configs). Replace `xxxx` with actual property names. For example, we can use `lance.storage.aws_access_key_id` to set S3 aws_access_key_id when using a S3 location, for detail, please refer to https://lancedb.com/docs/storage/integrations/ | (none) | No | 1.1.0 | + +- `CREATE`: Create a new table, fail if the table already exists. +- `EXIST_OK`: Create a new table if it does not exist, otherwise do nothing. +- `OVERWRITE`: Create a new table, overwrite if the table already exists, it will delete the existing data directory first if the table is not a registered table and then create a new one. + +**Location Requirement:** Must be specified at catalog, schema, or table level. See [Location Resolution](./lakehouse-generic-catalog.md#key-property-location). + +You may also set additional properties specific to your lakehouse format or custom requirements. + +### Table Operations + +Table operations follow standard relational catalog patterns. See [Table Operations](./manage-relational-metadata-using-gravitino.md#table-operations) for comprehensive documentation. + +The following sections provide examples and important details for working with Lance tables. + +#### Creating a Lance Table + +<Tabs groupId='language' queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" -d '{ + "name": "lance_table", + "comment": "Example Lance table", + "columns": [ + { + "name": "id", + "type": "integer", + "comment": "Primary identifier", + "nullable": false + } + ], + "properties": { + "format": "lance", + "location": "/tmp/lance_catalog/schema/lance_table" + } +}' http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +Catalog catalog = gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog"); +TableCatalog tableCatalog = catalog.asTableCatalog(); + +Map<String, String> tableProperties = ImmutableMap.<String, String>builder() + .put("format", "lance") + .put("location", "/tmp/lance_catalog/schema/example_table") + .build(); + +tableCatalog.createTable( + NameIdentifier.of("schema", "lance_table"), + new Column[] { + Column.of("id", Types.IntegerType.get(), "Primary identifier", + true, false, null) + }, + "Example Lance table", + tableProperties, + null, // partitions + null, // distributions + null, // sortOrders + null // indexes +); +``` + +</TabItem> +</Tabs> + +#### Registering External Tables + +Register existing Lance tables without moving or copying data: + +<Tabs groupId='language' queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" -d '{ + "name": "register_lance_table", + "comment": "Registered existing Lance table", + "columns": [], + "properties": { + "format": "lance", + "lance.register": "true", + "location": "/tmp/lance_catalog/schema/existing_lance_table" + } +}' http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +Catalog catalog = gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog"); +TableCatalog tableCatalog = catalog.asTableCatalog(); + +Map<String, String> registerProperties = ImmutableMap.<String, String>builder() + .put("format", "lance") + .put("lance.register", "true") + .put("location", "/tmp/lance_catalog/schema/existing_lance_table") + .build(); + +tableCatalog.createTable( + NameIdentifier.of("schema", "register_lance_table"), + new Column[] {}, // Schema auto-detected from existing table + "Registered existing Lance table", + registerProperties, + null, null, null, null +); +``` + +</TabItem> +</Tabs> + +:::tip Registration vs Creation +- **Registration** (`lance.register: true`): + - Links to existing Lance dataset or a path placeholder + - Schema automatically detected from Lance metadata + - Useful for importing existing datasets + +- **Creation** (default): + - Creates new Lance table from scratch + - Requires column schema definition + - Initializes new Lance dataset files +::: + +## Advanced Topics + +### Troubleshooting + +#### Common Issues + +**Issue: "Location not specified" error** +``` +Solution: Ensure at least one level (catalog/schema/table) specifies the location property +``` + +**Issue: Permission denied errors** +``` +Solution: Check file system permissions and credentials for the storage backend +``` + +**Issue: Table not found after registration** +``` +Solution: Verify the location path points to a valid Lance dataset directory +``` + +### Migration Guide + +#### Migrating Existing Lance Tables + +1. **Inventory**: List all existing Lance table locations +2. **Create Catalog**: Create Generic Lakehouse catalog pointing to root location +3. **Register Tables**: Use register operation for each table +4. **Verify**: Confirm all tables are accessible through Gravitino +5. **Update Clients**: Point applications to Gravitino metadata instead of direct Lance access + +**Example Migration Script:** + +```shell +# List of existing Lance tables to register +tables_to_migrate=( + "sales orders /data/sales/orders" + "sales customers /data/sales/customers" + "inventory products /data/inventory/products" +) + +# Register each table +for entry in "${tables_to_migrate[@]}"; do + read -r schema table location <<< "$entry" + echo ${schema} + echo ${table} + + curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" -d "{ + \"name\": \"${table}\", + \"comment\": \"Registered existing Lance table\", + \"columns\": [], + \"properties\": { + \"format\": \"lance\", + \"lance.register\": \"true\", + \"location\": \"${location}\" + } + }" http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/$schema/tables + + echo "Registered ${schema}.${table}" +done +``` + +Other table operations (load, alter, drop, truncate) follow standard relational catalog patterns. See [Table Operations](./manage-relational-metadata-using-gravitino.md#table-operations) for details. + diff --git a/docs/lance-rest-service.md b/docs/lance-rest-service.md new file mode 100644 index 0000000000..4b0db579f1 --- /dev/null +++ b/docs/lance-rest-service.md @@ -0,0 +1,401 @@ +--- +title: "Lance REST service" +slug: /lance-rest-service +keywords: + - Lance REST + - Lance datasets + - REST API +license: "This software is licensed under the Apache License version 2." +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Overview + +The Lance REST service provides a RESTful interface for managing Lance datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this service enables seamless interaction with Lance datasets for data operations and metadata management. + +The service implements the [Lance REST API specification](https://docs.lancedb.com/api-reference/introduction). For detailed specification documentation, see the [official Lance REST documentation](https://lance.org/format/namespace/rest/catalog-spec/). + +### What is Lance? + +[Lance](https://lance.org/format/) is a modern columnar data format designed for AI/ML workloads. It provides: + +- **High-performance vector search**: Native support for similarity search on high-dimensional embeddings +- **Columnar storage**: Optimized for analytical queries and machine learning pipelines +- **Fast random access**: Efficient row-level operations unlike traditional columnar formats +- **Version control**: Built-in dataset versioning and time-travel capabilities +- **Incremental updates**: Append and update data without full rewrites + +### Architecture + +The Lance REST service acts as a bridge between Lance datasets and applications: + +``` +┌─────────────────┐ +│ Applications │ +│ (Python/Java) │ +└────────┬────────┘ + │ HTTP/REST + ▼ +┌─────────────────┐ +│ Lance REST │ +│ Service │ +└────────┬────────┘ + │ + ▼ Gravitino Client API +┌─────────────────┐ +│ Gravitino Server │ +│(Metadata Backend)│ +└────────┬────────┘ + │ File System Operations + ▼ +┌─────────────────┐ +│ Lance Datasets │ +│ (S3/GCS/Local) │ +└─────────────────┘ +``` + +**Key Features:** +- Full compliance with Lance REST API specification +- Can run standalone or integrated with Gravitino server +- Support for namespace and table management +- Index creation and management capabilities (Index operations are not supported in version 1.1.0) +- Metadata stored in Gravitino for unified governance + +## Supported Operations + +The Lance REST service provides comprehensive support for namespace management, table management, and index operations. The table below lists all supported operations: + +| Operation | Description | HTTP Method | Endpoint Pattern | Since Version | +|-------------------|-------------------------------------------------------------------|-------------|---------------------------------------|---------------| +| CreateNamespace | Create a new Lance namespace | POST | `/lance/v1/namespace/{id}/create` | 1.1.0 | +| ListNamespaces | List all namespaces under a parent namespace | GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 | +| DescribeNamespace | Retrieve detailed information about a specific namespace | POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 | +| DropNamespace | Delete a namespace | POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 | +| NamespaceExists | Check whether a namespace exists | POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 | +| ListTables | List all tables in a namespace | GET | `/lance/v1/namespace/{id}/table/list` | 1.1.0 | +| CreateTable | Create a new table in a namespace | POST | `/lance/v1/table/{id}/create` | 1.1.0 | +| DropTable | Delete a table including both metadata and data | POST | `/lance/v1/table/{id}/drop` | 1.1.0 | +| TableExists | Check whether a table exists | POST | `/lance/v1/table/{id}/exists` | 1.1.0 | +| RegisterTable | Register an existing Lance table to a namespace | POST | `/lance/v1/table/{id}/register` | 1.1.0 | +| DeregisterTable | Unregister a table from a namespace (metadata only, data remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 | + +More details, please refer to the [Lance REST API specification](https://lance.org/format/namespace/rest/catalog-spec/) + +### Operation Details + +Some operations have specific behaviors and modes. Below are important details to consider: + +#### Namespace Operations + +**CreateNamespace** supports three modes: +- `create`: Fails if namespace already exists +- `exist_ok`: Succeeds even if namespace exists +- `overwrite`: Replaces existing namespace + +**DropNamespace** behavior: +- Recursively deletes all child namespaces and tables +- Deletes both metadata and Lance data files +- Operation is irreversible + +#### Table Operations + +**RegisterTable vs CreateTable**: +- **RegisterTable**: Links existing Lance datasets into Gravitino catalog without data movement +- **CreateTable**: Creates new Lance table with schema and write metadata files +:::note +The `version` field of `CreateTable` response is always null, which stands for the latest version. +::: + +**DropTable vs DeregisterTable**: +- **DropTable**: Permanently deletes metadata and data files from storage +- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance data files + + +## Deployment + +### Running with Gravitino Server + +To enable the Lance REST service within Gravitino server, configure the following properties in your Gravitino configuration file `${GRAVITINO_HOME}/conf/gravitino.conf`: + +| Configuration Property | Description | Default Value | Required | Since Version | +|-------------------------------------------|------------------------------------------------------------------------------|-------------------------|----------|---------------| +| `gravitino.auxService.names` | Auxiliary services to run. Include `lance-rest` to enable Lance REST service | iceberg-rest,lance-rest | Yes | 0.2.0 | +| `gravitino.lance-rest.classpath` | Classpath for Lance REST service, relative to Gravitino home directory | lance-rest-server/libs | Yes | 1.1.0 | +| `gravitino.lance-rest.httpPort` | Port number for Lance REST service | 9101 | No | 1.1.0 | +| `gravitino.lance-rest.host` | Hostname for Lance REST service | 0.0.0.0 | No | 1.1.0 | +| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend (currently only `gravitino` is supported) | gravitino | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI (required when namespace-backend is `gravitino`) | http://localhost:8090 | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name (required when namespace-backend is `gravitino`) | (none) | Yes | 1.1.0 | + +**Example Configuration:** + +```properties +gravitino.auxService.names = lance-rest +gravitino.lance-rest.httpPort = 9101 +gravitino.lance-rest.host = 0.0.0.0 +gravitino.lance-rest.namespace-backend = gravitino +gravitino.lance-rest.gravitino-uri = http://localhost:8090 +gravitino.lance-rest.gravitino-metalake = my_metalake +``` + +### Running Standalone + +To run Lance REST service independently without Gravitino server (You need to start Gravitino server first): + +```shell +{GRAVITINO_HOME}/bin/gravitino-lance-rest-server.sh start +``` + +Configure the service by editing `{GRAVITINO_HOME}/conf/gravitino-lance-rest-server.conf` or passing command-line arguments: + +| Configuration Property | Description | Default Value | Required | Since Version | +|-------------------------------------------|-----------------------------|-----------------------|----------|---------------| +| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend | gravitino | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI | http://localhost:8090 | Yes | 1.1.0 | +| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name | (none) | Yes | 1.1.0 | +| `gravitino.lance-rest.httpPort` | Service port number | 9101 | No | 1.1.0 | +| `gravitino.lance-rest.host` | Service hostname | 0.0.0.0 | No | 1.1.0 | + +:::tip +In most cases, you only need to configure `gravitino.lance-rest.gravitino-metalake` and other properties can use their default values. +::: + + +### Running with Docker + +Launch Lance REST service using Docker(You need to start Gravitino server first): + +```shell +docker run -d --name lance-rest-service -p 9101:9101 \ + -e LANCE_REST_GRAVITINO_URI=http://gravitino-host:8090 \ + -e LANCE_REST_GRAVITINO_METALAKE_NAME=your_metalake_name \ + -e LANCE_REST_GRAVITINO_URI=http://gravitino-host:port \ + apache/gravitino-lance-rest:latest +``` + +Access the service at `http://localhost:9101`. + +**Environment Variables:** + +| Environment Variable | Configuration Property | Required | Default Value | Since Version | +|--------------------------------------|-------------------------------------------|----------|-------------------------|---------------| +| `LANCE_REST_NAMESPACE_BACKEND` | `gravitino.lance-rest.namespace-backend` | Yes | `gravitino` | 1.1.0 | +| `LANCE_REST_GRAVITINO_METALAKE_NAME` | `gravitino.lance-rest.gravitino-metalake` | Yes | (none) | 1.1.0 | +| `LANCE_REST_GRAVITINO_URI` | `gravitino.lance-rest.gravitino-uri` | Yes | `http://localhost:8090` | 1.1.0 | +| `LANCE_REST_HOST` | `gravitino.lance-rest.host` | No | `0.0.0.0` | 1.1.0 | +| `LANCE_REST_PORT` | `gravitino.lance-rest.httpPort` | No | `9101` | 1.1.0 | + +:::tip Configuration Tips +- **Required:** Set `LANCE_REST_GRAVITINO_METALAKE_NAME` to your Gravitino metalake name +- **Conditional:** Update `LANCE_REST_GRAVITINO_URI` if Gravitino server is not on `localhost` in the docker instance. +- **Optional:** Other variables can use default values unless you have specific requirements +::: + +## Usage Guidelines + +When using Lance REST service with Gravitino backend, keep the following considerations in mind: + +### Prerequisites +- A running Gravitino server with a created metalake + +### Namespace Hierarchy +Gravitino follows a three-level hierarchy: **catalog → schema → table**. When creating namespaces or tables: + +1. **Parent must exist:** Before creating `lance_catalog/schema`, ensure `lance_catalog` catalog exists in Gravitino metalake. +2. **Two-level limit:** You can create namespace `lance_catalog/schema`, but **not** `lance_catalog/schema/sub_schema`. +3. **Table placement:** Tables can only be created under `lance_catalog/schema`, not at catalog level. + +**Example Hierarchy:** +``` +metalake +└── lance_catalog (catalog - create via REST) + └── schema (namespace - create via REST) + └── table01 (table - create via REST) +``` + +### Delimiter Convention + +The Lance REST API uses `$` as the default delimiter to separate namespace levels in URIs. When making HTTP requests: + +- **URL Encoding Required**: `$` must be URL-encoded as `%24` +- **Example**: `lance_catalog$schema$table01` becomes `lance_catalog%24schema%24table01` in URLs + +**Common Delimiters:** +``` +Namespace path: lance_catalog.schema.table01 +URI representation: lance_catalog$schema$table01 +URL encoded: lance_catalog%24schema%24table01 +``` + +:::caution Important Limitations +- Currently supports only **two levels of namespaces** before tables +- Tables **cannot** be nested deeper than schema level +- Parent catalog must be created in Gravitino before using Lance REST API +- Metadata operations require Gravitino server to be available +- Namespace deletion is recursive and irreversible +::: + +## Examples + +The following examples demonstrate how to interact with Lance REST service using different programming languages and tools. + +**Prerequisites:** +- Gravitino server is running with Lance REST service enabled. +- A metalake has been created in Gravitino. + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +# Create a catalog-level namespace +# mode: "create" | "exist_ok" | "overwrite" for create namespace/table; mode: "create" | "overwrite" for register table +curl -X POST http://localhost:9101/lance/v1/namespace/lance_catalog/create \ + -H 'Content-Type: application/json' \ + -d '{ + "id": ["lance_catalog"], + "mode": "create" + }' + +# Create a schema namespace +# Note: %24 is URL-encoded '$' character used as delimiter +curl -X POST http://localhost:9101/lance/v1/namespace/lance_catalog%24schema/create \ + -H 'Content-Type: application/json' \ + -d '{ + "id": ["lance_catalog", "schema"], + "mode": "create" + }' + +# Register an existing table +curl -X POST http://localhost:9101/lance/v1/table/lance_catalog%24schema%24table01/register \ + -H 'Content-Type: application/json' \ + -d '{ + "id": ["lance_catalog", "schema", "table01"], + "location": "/tmp/lance_catalog/schema/table01", + "mode": "CREATE" + }' + +# Create a new empty table +curl -X POST http://localhost:9101/lance/v1/table/lance_catalog%24schema%24table02/create-empty \ + -H 'Content-Type: application/json' \ + -d '{ + "id": ["lance_catalog", "schema", "table02"], + "location": "/tmp/lance_catalog/schema/table02", + "properties": { "description": "This is table02" } + }' + +# Create a table with schema, the schema is inferred from the Arrow IPC file +curl -X POST \ + "http://localhost:9101/lance/v1/table/lance_catalog%24schema%24table03/create" \ + -H 'Content-Type: application/vnd.apache.arrow.stream' \ + -H "x-lance-table-location: /tmp/lance_catalog/schema/table03" \ + -H "x-lance-table-properties: {}" \ + --data-binary "@${ARROW_FILE}" +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +// Add dependency: implementation("com.lancedb:lance-namespace-core:0.0.20") + +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.memory.RootAllocator; +import java.util.HashMap; +import java.util.Map; + +// Initialize allocator and namespace connection +private final BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE); + +Map<String, String> props = new HashMap<>(); +props.put(RestNamespaceConfig.URI, "http://localhost:9101/lance"); +props.put(RestNamespaceConfig.DELIMITER, RestNamespaceConfig.DELIMITER_DEFAULT); + +LanceNamespace ns = LanceNamespaces.connect("rest", props, null, allocator); + +// Create catalog namespace +CreateNamespaceRequest createCatalogNsRequest = new CreateNamespaceRequest(); +createCatalogNsRequest.addIdItem("lance_catalog"); +createCatalogNsRequest.setMode(CreateNamespaceRequest.ModeEnum.CREATE); +ns.createNamespace(createCatalogNsRequest); + +// Create schema namespace +CreateNamespaceRequest createSchemaNsRequest = new CreateNamespaceRequest(); +createSchemaNsRequest.addIdItem("lance_catalog"); +createSchemaNsRequest.addIdItem("schema"); +createSchemaNsRequest.setMode(CreateNamespaceRequest.ModeEnum.CREATE); +ns.createNamespace(createSchemaNsRequest); + +// Register a table +RegisterTableRequest registerTableRequest = new RegisterTableRequest(); +registerTableRequest.setLocation("/tmp/lance_catalog/schema/table01"); +registerTableRequest.setId(Lists.newArrayList("lance_catalog", "schema", "table01")); +registerTableRequest.setMode(RegisterTableRequest.ModeEnum.CREATE); +ns.registerTable(registerTableRequest); + +// Create an empty table +CreateEmptyTableRequest createEmptyTableRequest = new CreateEmptyTableRequest(); +createEmptyTableRequest.setLocation("/tmp/lance_catalog/schema/table02"); +createEmptyTableRequest.setId(Lists.newArrayList("lance_catalog", "schema", "table02")); +ns.createEmptyTable(createEmptyTableRequest); + +// Create a table with schema inferred from Arrow IPC file +CreateTableRequest createTableRequest = new CreateTableRequest(); +createTableRequest.setIds(Lists.newArrayList("lance_catalog", "schema", "table03")); +createTableRequest.setLocation("/tmp/lance_catalog/schema/table03"); +org.apache.arrow.vector.types.pojo.Schema schema = + new org.apache.arrow.vector.types.pojo.Schema( + Arrays.asList( + Field.nullable("id", new ArrowType.Int(32, true)), + Field.nullable("value", new ArrowType.Utf8()))); +byte[] body = ArrowUtils.generateIpcStream(schema); +ns.createTable(createTableRequest, body); + +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +# Install: pip install lance-namespace==0.0.20 + +import lance_namespace as ln + +# Connect to Lance REST service +ns = ln.connect("rest", {"uri": "http://your_lance_rest:9101/lance"}) + +# Create catalog namespace +create_catalog_ns_request = ln.CreateNamespaceRequest(id=["lance_catalog"]) +catalog = ns.create_namespace(create_catalog_ns_request) + +# Create schema namespace +create_schema_ns_request = ln.CreateNamespaceRequest(id=["lance_catalog", "schema"]) +schema = ns.create_namespace(create_schema_ns_request) + +# Register a table +register_table_request = ln.RegisterTableRequest( + id=['lance_catalog', 'schema', 'table01'], + location='/tmp/lance_catalog/schema/table01' +) +ns.register_table(register_table_request) + +# Create an empty table +create_empty_table_request = ln.CreateEmptyTableRequest( + id=['lance_catalog', 'schema', 'table02'], + location='/tmp/lance_catalog/schema/table02' +) + +# Create a table with schema inferred from Arrow IPC file +create_table_request = ln.CreateTableRequest( + id=['lance_catalog', 'schema', 'table03'], + location='/tmp/lance_catalog/schema/table03' +) +with open('schema.ipc', 'rb') as f: + body = f.read() + +ns.create_table(create_table_request, body) +``` + +</TabItem> +</Tabs> diff --git a/docs/manage-relational-metadata-using-gravitino.md b/docs/manage-relational-metadata-using-gravitino.md index a05d9393f1..a1fdfa87f5 100644 --- a/docs/manage-relational-metadata-using-gravitino.md +++ b/docs/manage-relational-metadata-using-gravitino.md @@ -27,6 +27,7 @@ For more details, please refer to the related doc. - [**Apache Iceberg**](./lakehouse-iceberg-catalog.md) - [**Apache Paimon**](./lakehouse-paimon-catalog.md) - [**Apache Hudi**](./lakehouse-hudi-catalog.md) +- [**Lakehouse generic**](./lakehouse-generic-catalog.md) If you want to operate table and partition statistics, you can refer to the [document](./manage-statistics-in-gravitino.md). @@ -111,17 +112,18 @@ gravitino_client.create_catalog(name="catalog", Currently, Gravitino supports the following catalog providers: -| Catalog provider | Catalog property | -|---------------------|--------------------------------------------------------------------------------| -| `hive` | [Hive catalog property](./apache-hive-catalog.md#catalog-properties) | -| `lakehouse-iceberg` | [Iceberg catalog property](./lakehouse-iceberg-catalog.md#catalog-properties) | -| `lakehouse-paimon` | [Paimon catalog property](./lakehouse-paimon-catalog.md#catalog-properties) | -| `lakehouse-hudi` | [Hudi catalog property](./lakehouse-hudi-catalog.md#catalog-properties) | -| `jdbc-mysql` | [MySQL catalog property](./jdbc-mysql-catalog.md#catalog-properties) | -| `jdbc-postgresql` | [PostgreSQL catalog property](./jdbc-postgresql-catalog.md#catalog-properties) | -| `jdbc-doris` | [Doris catalog property](./jdbc-doris-catalog.md#catalog-properties) | -| `jdbc-oceanbase` | [OceanBase catalog property](./jdbc-oceanbase-catalog.md#catalog-properties) | -| `jdbc-starrocks` | [StarRocks catalog property](./jdbc-starrocks-catalog.md#catalog-properties) | +| Catalog provider | Catalog property | +|---------------------|-----------------------------------------------------------------------------------------| +| `hive` | [Hive catalog property](./apache-hive-catalog.md#catalog-properties) | +| `lakehouse-iceberg` | [Iceberg catalog property](./lakehouse-iceberg-catalog.md#catalog-properties) | +| `lakehouse-paimon` | [Paimon catalog property](./lakehouse-paimon-catalog.md#catalog-properties) | +| `lakehouse-hudi` | [Hudi catalog property](./lakehouse-hudi-catalog.md#catalog-properties) | +| `jdbc-mysql` | [MySQL catalog property](./jdbc-mysql-catalog.md#catalog-properties) | +| `jdbc-postgresql` | [PostgreSQL catalog property](./jdbc-postgresql-catalog.md#catalog-properties) | +| `jdbc-doris` | [Doris catalog property](./jdbc-doris-catalog.md#catalog-properties) | +| `jdbc-oceanbase` | [OceanBase catalog property](./jdbc-oceanbase-catalog.md#catalog-properties) | +| `jdbc-starrocks` | [StarRocks catalog property](./jdbc-starrocks-catalog.md#catalog-properties) | +| `lakehouse-generic` | [Lakehouse generic catalog property](./lakehouse-generic-catalog.md#catalog-properties) | ### Load a catalog @@ -506,6 +508,7 @@ Currently, Gravitino supports the following schema property: | `jdbc-doris` | [Doris schema property](./jdbc-doris-catalog.md#schema-properties) | | `jdbc-oceanbase` | [OceanBase schema property](./jdbc-oceanbase-catalog.md#schema-properties) | | `jdbc-starrocks` | [StarRocks schema property](./jdbc-starrocks-catalog.md#schema-properties) | +| `lakehouse-generic` | [Lakehouse generic schema property](./lakehouse-generic-catalog.md#schema-properties) | ### Load a schema @@ -990,6 +993,7 @@ The following is a table of the column default value that Gravitino supports for | `jdbc-doris` | ✔ | | `jdbc-oceanbase` | ✔ | | `jdbc-starrocks` | ✔ | +| `lakehouse-generic` | ✘ | #### Table column auto-increment @@ -1007,22 +1011,24 @@ The following table shows the column auto-increment that Gravitino supports for | `jdbc-doris` | ✘ | | `jdbc-oceanbase` | ✔([limitations](./jdbc-oceanbase-catalog.md#table-column-auto-increment)) | | `jdbc-starrocks` | ✔ | +| `lakehouse-generic` | ✘ | #### Table property and type mapping The following is the table property that Gravitino supports: -| Catalog provider | Table property | Type mapping | -|---------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------| -| `hive` | [Hive table property](./apache-hive-catalog.md#table-properties) | [Hive type mapping](./apache-hive-catalog.md#table-column-types) | -| `lakehouse-iceberg` | [Iceberg table property](./lakehouse-iceberg-catalog.md#table-properties) | [Iceberg type mapping](./lakehouse-iceberg-catalog.md#table-column-types) | -| `lakehouse-paimon` | [Paimon table property](./lakehouse-paimon-catalog.md#table-properties) | [Paimon type mapping](./lakehouse-paimon-catalog.md#table-column-types) | -| `lakehouse-hudi` | [Hudi table property](./lakehouse-hudi-catalog.md#table-properties) | [Hudi type mapping](./lakehouse-hudi-catalog.md#table-column-types) | -| `jdbc-mysql` | [MySQL table property](./jdbc-mysql-catalog.md#table-properties) | [MySQL type mapping](./jdbc-mysql-catalog.md#table-column-types) | -| `jdbc-postgresql` | [PostgreSQL table property](./jdbc-postgresql-catalog.md#table-properties) | [PostgreSQL type mapping](./jdbc-postgresql-catalog.md#table-column-types) | -| `jdbc-doris` | [Doris table property](./jdbc-doris-catalog.md#table-properties) | [Doris type mapping](./jdbc-doris-catalog.md#table-column-types) | -| `jdbc-oceanbase` | [OceanBase table property](./jdbc-oceanbase-catalog.md#table-properties) | [OceanBase type mapping](./jdbc-oceanbase-catalog.md#table-column-types) | -| `jdbc-starrocks` | [StarRocks table property](./jdbc-starrocks-catalog.md#table-properties) | [StarRocks type mapping](./jdbc-starrocks-catalog.md#table-column-types) | +| Catalog provider | Table property | Type mapping | +|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `hive` | [Hive table property](./apache-hive-catalog.md#table-properties) | [Hive type mapping](./apache-hive-catalog.md#table-column-types) | +| `lakehouse-iceberg` | [Iceberg table property](./lakehouse-iceberg-catalog.md#table-properties) | [Iceberg type mapping](./lakehouse-iceberg-catalog.md#table-column-types) | +| `lakehouse-paimon` | [Paimon table property](./lakehouse-paimon-catalog.md#table-properties) | [Paimon type mapping](./lakehouse-paimon-catalog.md#table-column-types) | +| `lakehouse-hudi` | [Hudi table property](./lakehouse-hudi-catalog.md#table-properties) | [Hudi type mapping](./lakehouse-hudi-catalog.md#table-column-types) | +| `jdbc-mysql` | [MySQL table property](./jdbc-mysql-catalog.md#table-properties) | [MySQL type mapping](./jdbc-mysql-catalog.md#table-column-types) | +| `jdbc-postgresql` | [PostgreSQL table property](./jdbc-postgresql-catalog.md#table-properties) | [PostgreSQL type mapping](./jdbc-postgresql-catalog.md#table-column-types) | +| `jdbc-doris` | [Doris table property](./jdbc-doris-catalog.md#table-properties) | [Doris type mapping](./jdbc-doris-catalog.md#table-column-types) | +| `jdbc-oceanbase` | [OceanBase table property](./jdbc-oceanbase-catalog.md#table-properties) | [OceanBase type mapping](./jdbc-oceanbase-catalog.md#table-column-types) | +| `jdbc-starrocks` | [StarRocks table property](./jdbc-starrocks-catalog.md#table-properties) | [StarRocks type mapping](./jdbc-starrocks-catalog.md#table-column-types) | +| `lakehouse-generic` | Lakehouse generic table property depends on specific table implementation, for Lance table, please refer to [doc](./lakehouse-generic-lance-table.md#table-properties), other table format, please refer to related docs. | Lakehouse generic type mapping. Similar to table properties, for Lance table, please refer to [docs](./lakehouse-generic-lance-table.md#data-type-mappings) | #### Table partitioning, distribution, sort ordering and indexes
