GitHub user FANNG1 created a discussion: [DISCUSS] Long-term architecture for 
Lance support in Gravitino

## Background

Gravitino has started to support Lance through the Lance REST server and 
generic lakehouse Lance tables. Recent discussions around authorization, 
credential vending, and Lance path namespace compatibility show that we need to 
align on the long-term architecture before adding more features.

This discussion proposes a long-term direction for Lance support in Gravitino.

## Overall Picture

In the long term, Gravitino should expose two major entry points:

```text
Users / governance systems
  -> Gravitino REST / client
  -> Lance catalog

Compute engines
  -> Gravitino Lance REST server
  -> Lance REST Namespace API
```

The goals are:

- Expose a first-class Lance catalog to users.
- Expose a Lance REST server to engines such as Spark, Ray, and Daft.
- Reuse Gravitino authorization, credential vending, and catalog/table 
lifecycle capabilities.
- Move the current generic catalog Lance table capabilities into shared 
internal services.

## Core Decision

The largest architectural decision is whether Gravitino should support 
non-Gravitino namespace backends for Lance.

This proposal assumes that we should support them. In that model, Gravitino 
Lance REST is not only a protocol adapter for Gravitino metadata. It is also an 
engine-facing Lance namespace gateway.

Supported metadata authorities:

```text
path backend      -> Lance path namespace
rest backend      -> remote Lance REST namespace
gravitino backend -> Gravitino metadata
```

If we only support the Gravitino backend, the architecture is simpler:

```text
Lance REST
  -> Gravitino internal dispatcher
  -> Gravitino Lance catalog metadata
```

But that would give up support for existing Lance path namespaces, remote Lance 
REST namespaces, and lightweight standalone Lance REST deployments.

## Proposed Long-Term Model

Introduce a first-class Lance catalog provider:

```properties
provider = lance
namespace-backend = path | rest | gravitino
```

Ignoring nested namespaces for now, the object model can be viewed as:

```text
metalake
  catalog
    namespace
      table
```

Each Lance catalog instance should choose exactly one namespace backend.

## Backend Semantics

### Path backend

- Metadata comes from Lance dataset metadata.
- Schema is read from the Lance dataset.
- Location is resolved from namespace root and table identifier.
- Lance REST can generate temporary credentials based on table location.

### REST backend

- Metadata comes from a remote Lance REST Namespace service.
- Gravitino Lance REST acts as a gateway.
- Credential vending is delegated to the remote Lance REST service by default.

### Gravitino backend

- Metadata is stored in Gravitino.
- Lance REST accesses metadata through Gravitino internal dispatchers.
- Authorization and credential configuration come from Gravitino.
- Supported only in auxiliary mode, not standalone mode.

## Deployment Modes

```text
auxiliary mode:
  supports path / rest / gravitino backends

standalone mode:
  supports path / rest backends
  does not support gravitino backend
```

Standalone Lance REST should not directly read the Gravitino entity store or 
act as a generic HTTP proxy for Gravitino metadata.

## Authorization and Credential Vending

Metadata authorization should be table-oriented:

```text
metalake / lance_catalog / namespace / table
```

When an engine accesses a Gravitino Lance catalog through Lance REST, 
authorization should follow a pattern similar to Iceberg REST:

```text
Lance REST request
  -> resolve catalog / namespace / table
  -> build Gravitino request context and principal
  -> map Lance REST operation to Gravitino metadata privilege
  -> authorize the table under the Gravitino Lance catalog
  -> execute the metadata operation
```

Credential vending should be executed through the Lance REST pipeline:

```text
engine
  -> Lance REST describeTable
  -> resolve table metadata and location
  -> authorize table/data access
  -> generate or fetch temporary credential
  -> return schema, location, storage_options
```

For the Gravitino backend, metadata authority and policy authority still come 
from Gravitino.

## Role of Generic Catalog

The following should remain as a compatibility path, but not the long-term 
user-facing model:

```text
provider = lakehouse-generic
table_format = lance
```

The main external path should be:

```text
provider = lance
```

The existing Lance capabilities in the generic catalog should be moved into 
shared internal services and reused by both paths.

GitHub link: https://github.com/apache/gravitino/discussions/11295

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to