Hi all,
I'd like to start a discussion on simplifying engine access to
Iceberg REST and Lance REST catalogs
managed by Gravitino.
Background
Today, when users want Spark to access Iceberg REST and Lance REST
catalogs managed by Gravitino, they
must maintain separate engine-side catalog configurations that
duplicate what Gravitino already knows —
Gravitino, those changes need to be manually propagated to every
engine's configuration.
Proposed Approach
The proposal introduces a provider-level engine-access-mode on the
engine side:
spark.sql.gravitino.<provider>.engine-access-mode = auto |
gravitino | native
With this, users only configure the Gravitino server address and
metalake. The engine connector calls
listCatalogsInfo() at startup and auto-registers the appropriate
catalogs by translating existing
Gravitino catalog properties — no new catalog properties or
server-side APIs are needed.
While the first phase focuses on Spark (Iceberg REST and Lance
REST), the same engine-access-mode
semantics are designed to extend naturally to Flink, Trino, Doris,
Daft, and other engines.
The full design document is in the PR for review:
https://github.com/apache/gravitino/pull/11280
Looking forward to hearing your thoughts.
Best,
Xiaojing