yuqi1129 commented on code in PR #9173:
URL: https://github.com/apache/gravitino/pull/9173#discussion_r2608925272


##########
docs/lakehouse-generic-lance-table.md:
##########
@@ -0,0 +1,294 @@
+---
+title: "Lance table support"
+slug: /lance-table-support
+keywords:
+- lakehouse
+- lance
+- metadata
+- generic catalog
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Overview
+
+This document describes how to use Apache Gravitino to manage a generic 
lakehouse catalog using Lance as the underlying table format. 
+
+
+## Table Management
+
+### Supported Operations
+
+For Lance tables in a Generic Lakehouse Catalog, the following table 
summarizes supported operations:
+
+| Operation | Support Status  |
+|-----------|-----------------|
+| List      | ✅ Full          |
+| Load      | ✅ Full          |
+| Alter     | Not support now |
+| Create    | ✅ Full          |
+| Register  | ✅ Full          |
+| Drop      | ✅ Full          |
+| Purge     | ✅ Full          |
+
+:::note Feature Limitations
+- **Partitioning:** Not currently supported
+- **Sort Orders:** Not currently supported
+- **Distributions:** Not currently supported
+- **Indexes:** Not currently supported
+  :::
+
+### Data Type Mappings
+
+Lance uses Apache Arrow for table schemas. The following table shows type 
mappings between Gravitino and Arrow:
+
+| Gravitino Type                   | Arrow Type                              |
+|----------------------------------|-----------------------------------------|
+| `Struct`                         | `Struct`                                |
+| `Map`                            | `Map`                                   |
+| `List`                           | `Array`                                 |
+| `Boolean`                        | `Boolean`                               |
+| `Byte`                           | `Int8`                                  |
+| `Short`                          | `Int16`                                 |
+| `Integer`                        | `Int32`                                 |
+| `Long`                           | `Int64`                                 |
+| `Float`                          | `Float`                                 |
+| `Double`                         | `Double`                                |
+| `String`                         | `Utf8`                                  |
+| `Binary`                         | `Binary`                                |
+| `Decimal(p, s)`                  | `Decimal(p, s)` (128-bit)               |
+| `Date`                           | `Date`                                  |
+| `Timestamp`/`Timestamp(6)`       | `TimestampType withoutZone`             |
+| `Timestamp(0)`                   | `TimestampType Second withoutZone`      |
+| `Timestamp(3)`                   | `TimestampType Millisecond withoutZone` |
+| `Timestamp(9)`                   | `TimestampType Nanosecond withoutZone`  |
+| `Timestamp_tz`/`Timestamp_tz(6)` | `TimestampType Microsecond withUtc`     |
+| `Timestamp_tz(0)`                | `TimestampType Second withUtc`          |
+| `Timestamp_tz(3)`                | `TimestampType Millisecond withUtc`     |
+| `Timestamp_tz(9)`                | `TimestampType Nanosecond withUtc`      |
+| `Time`/`Time(9)`                 | `Time Nanosecond`                       |
+| `Null`                           | `Null`                                  |
+| `Fixed(n)`                       | `Fixed-Size Binary(n)`                  |
+| `Interval_year`                  | `Interval(YearMonth)`                   |
+| `Interval_day`                   | `Duration(Microsecond)`                 |
+| `External(arrow_field_json_str)` | Any Arrow Field                         |
+
+### External Type Support
+
+For Arrow types not natively mapped in Gravitino, use the 
`External(arrow_field_json_str)` type, which accepts a JSON string 
representation of an Arrow `Field`.
+
+**Requirements:**
+- JSON must conform to Apache Arrow [Field 
specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68)
+- `name` attribute must match column name exactly
+- `nullable` attribute must match column nullability
+- `children` array:
+  - Empty for primitive types
+  - Contains child field definitions for complex types (Struct, List)
+
+**Examples:**
+
+| Arrow Type        | External Type Definition                                 
                                                                                
                                                                                
               |
+|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Large Utf8`      | 
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")`
                                                                                
                                               |
+| `Large Binary`    | 
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")`
                                                                                
                                             |
+| `Large List`      | 
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
         |
+| `Fixed-Size List` | 
`External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\",\"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\",\"bitWidth\":32,\"isSigned\":true},\"children\":[]}]}")`
 |
+
+### Table Properties
+
+Required and optional properties for tables in a Generic Lakehouse Catalog:
+
+| Property              | Description                                          
                                                                                
                                                                                
                                                                                
                                           | Default  | Required     | Since 
Version |
+|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|
+| `format`              | Table format: `lance`, currently only `lance` is 
fully supported.                                                                
                                                                                
                                                                                
                                               | (none)   | Yes          | 
1.1.0         |
+| `location`            | Storage path for table metadata and data, Lance 
currently supports: S3, GCS, OSS, AZ, File, Memory and file-object-store.       
                                                                                
                                                                                
                                                | (none)   | Conditional* | 
1.1.0         |
+| `external`            | Whether the data directory is an external location. 
If it's `true`, dropping a table will only remove metadata in Gravitino and 
will not delete the data directory, and purge table will delete both. For a 
non-external table, dropping will drop both.                                    
                                                    | false    | No           | 
1.1.0         |
+| `lance.creation-mode` | Create mode: for create table, it can be `CREATE`, 
`EXIST_OK` or `OVERWRITE`. and it should be `CREATE` or `OVERWRITE` for 
registering tables                                                              
                                                                                
                                                     | `CREATE` | No           
| 1.1.0         |
+| `lance.register`      | Whether it is a register table operation. This API 
will not create data directory actually and it's the user's responsibility to 
create and manage the data directory.                                           
                                                                                
                                               | false    | No           | 
1.1.0         |
+| `lance.storage.xxxx`  | Any additional storage-specific properties required 
by Lance format (e.g., S3 credentials, HDFS configs). Replace `xxxx` with 
actual property names. For example, we can use 
`lance.storage.aws_access_key_id` to set S3 aws_access_key_id when using a S3 
location, for detail, please refer to 
https://lancedb.com/docs/storage/integrations/ | (none)   | No           | 
1.1.0         |
+
+
+**Location Requirement:** Must be specified at catalog, schema, or table 
level. See [Location 
Resolution](./lakehouse-generic-catalog.md#key-property-location).
+
+You may also set additional properties specific to your lakehouse format or 
custom requirements.
+
+### Table Operations
+
+Table operations follow standard relational catalog patterns. See [Table 
Operations](./manage-relational-metadata-using-gravitino.md#table-operations) 
for comprehensive documentation.
+
+The following sections provide examples and important details for working with 
Lance tables. 
+
+#### Creating a Lance Table
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" -d '{
+  "name": "lance_table",
+  "comment": "Example Lance table",
+  "columns": [
+    {
+      "name": "id",
+      "type": "integer",
+      "comment": "Primary identifier",
+      "nullable": false
+    }
+  ],
+  "properties": {
+    "format": "lance",
+    "location": "/tmp/lance_catalog/schema/lance_table"
+  }
+}' 
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Catalog catalog = 
gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> tableProperties = ImmutableMap.<String, String>builder()
+    .put("format", "lance")
+    .put("location", "/tmp/lance_catalog/schema/example_table")
+    .build();
+
+tableCatalog.createTable(
+    NameIdentifier.of("schema", "lance_table"),
+    new Column[] {
+        Column.of("id", Types.IntegerType.get(), "Primary identifier", 
+                  false, true, Literals.integerLiteral(-1))
+    },
+    "Example Lance table",
+    tableProperties,
+    null,  // partitions
+    null,  // distributions
+    null,  // sortOrders
+    null   // indexes
+);
+```
+
+</TabItem>
+</Tabs>
+
+#### Registering External Tables
+
+Register existing Lance tables without moving or copying data:
+
+<Tabs groupId='language' queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" -d '{
+  "name": "register_lance_table",
+  "comment": "Registered existing Lance table",
+  "columns": [],
+  "properties": {
+    "format": "lance",
+    "lance.register": "true",
+    "location": "/tmp/lance_catalog/schema/existing_lance_table"
+  }
+}' 
http://localhost:8090/api/metalakes/test/catalogs/generic_lakehouse_lance_catalog/schemas/schema/tables
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+Catalog catalog = 
gravitinoClient.loadCatalog("generic_lakehouse_lance_catalog");
+TableCatalog tableCatalog = catalog.asTableCatalog();
+
+Map<String, String> registerProperties = ImmutableMap.<String, String>builder()
+    .put("format", "lance")
+    .put("lance.register", "true")
+    .put("location", "/tmp/lance_catalog/schema/existing_lance_table")
+    .build();
+
+tableCatalog.createTable(
+    NameIdentifier.of("schema", "register_lance_table"),
+    new Column[] {},  // Schema auto-detected from existing table
+    "Registered existing Lance table",
+    registerProperties,
+    null, null, null, null
+);
+```
+
+</TabItem>
+</Tabs>
+
+:::tip Registration vs Creation
+- **Registration** (`lance.register: true`):
+  - Links to existing Lance dataset or a path placeholder
+  - Schema automatically detected from Lance metadata
+  - Useful for importing existing datasets
+
+- **Creation** (default):
+  - Creates new Lance table from scratch
+  - Requires column schema definition
+  - Initializes new Lance dataset files
+:::
+
+## Advanced Topics
+
+### Troubleshooting
+
+#### Common Issues
+
+**Issue: "Location not specified" error**
+```
+Solution: Ensure at least one level (catalog/schema/table) specifies the 
location property
+```
+
+**Issue: Permission denied errors**
+```
+Solution: Check file system permissions and credentials for the storage backend
+```
+
+**Issue: Table not found after registration**
+```
+Solution: Verify the location path points to a valid Lance dataset directory
+```
+
+### Migration Guide
+
+#### Migrating Existing Lance Tables
+
+1. **Inventory**: List all existing Lance table locations
+2. **Create Catalog**: Create Generic Lakehouse catalog pointing to root 
location
+3. **Register Tables**: Use register operation for each table
+4. **Verify**: Confirm all tables are accessible through Gravitino
+5. **Update Clients**: Point applications to Gravitino metadata instead of 
direct Lance access
+
+**Example Migration Script:**
+
+```python
+import lance_namespace as ln
+
+# Connect to Lance REST service
+ns = ln.connect("rest", {"uri": "http://localhost:9101/lance"})

Review Comment:
   Do you mean I should not use a specific URI like 
`http://localhost:9101/lance` and use a URI placeholder instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to