mengxianwen1 opened a new pull request, #10980:
URL: https://github.com/apache/gravitino/pull/10980
### What changes were proposed in this pull request?
This PR adds a three-level property inheritance mechanism for
`lance.storage.*` properties (AK/SK, endpoint, etc.) so that Lance tables can
inherit storage credentials from catalog and schema levels instead of requiring
them on every
`createTable` call. The inheritance precedence is: **table > schema >
catalog** (child overrides parent).
Changes:
1. **`GenericCatalogPropertiesMetadata` /
`GenericSchemaPropertiesMetadata`**: Added `lance.storage.*` prefix property
entry with `hidden=true` so credentials are accepted but not exposed in API
responses.
2. **`GenericCatalogOperations`**:
- Store `conf` (catalog properties) in `initialize()` following the
`FilesetCatalogOperations` pattern.
- Added `mergeLanceStorageProperties()` method that merges
`lance.storage.*` from catalog → schema → table with child-overrides-parent
precedence.
- `createTable()` now calls `mergeLanceStorageProperties()` and puts
merged properties into table properties before delegating to format-specific
operations.
3. **`GravitinoLanceTableOperations`** (Lance REST Server path):
- Added `mergeCatalogStorageProperties()` to merge catalog-level
`lance.storage.*` into table properties using `putIfAbsent` (table-level takes
precedence).
- Called in `createTable()` and `registerTable()`.
(`createEmptyTable()` delegates to `createTable()` so it's covered
automatically.)
4. **`LanceTableOperations.openDataset()`**: Fixed a bug where
`openDataset()` did not pass storage options to `Dataset.open()`, causing
`alterTable` and other read operations to fail on cloud-based Lance datasets.
Added `openDataset(String,
Map)` overload and updated `handleLanceTableChange()` to pass storage
options.
### Why are the changes needed?
Currently, only the `location` property has a three-level fallback chain
(table → schema → catalog). The `lance.storage.*` properties (AK/SK, endpoint,
region) must be explicitly specified on every `createTable` call, even though
these
credentials are typically shared across all tables under the same catalog
or schema. This is redundant and error-prone.
Additionally, `LanceTableOperations.openDataset()` lacks storage options,
which means `alterTable` operations fail on cloud-based Lance datasets that
require credentials.
Fix: #10979
### Does this PR introduce _any_ user-facing change?
Yes:
1. **New property keys**: Users can now set `lance.storage.*` properties
at catalog and schema levels (e.g., `lance.storage.s3.access-key-id`,
`lance.storage.s3.endpoint`). These properties are marked as `hidden` and will
not be returned in
API responses.
2. **Behavior change**: When creating a Lance table without
`lance.storage.*` properties, the table will now automatically inherit them
from the schema and catalog levels. Previously, these properties would be
absent, causing creation to
fail for cloud storage backends.
3. **Backward compatible**: Tables that already explicitly set
`lance.storage.*` continue to work unchanged — table-level values always take
precedence.
### How was this patch tested?
(Please test your changes, and provide instructions on how to test it:
- **`TestLanceStoragePropertyInheritance`** (12 tests): Covers the
three-level merge logic including precedence overrides, null handling, and
non-lance-storage property filtering.
- **`TestGravitinoLanceTableOperationsStorageMerge`** (5 tests): Covers
the REST Server path catalog-to-table merge logic.
- **`TestLanceTableOperations`**: Updated mock for `openDataset` signature
change.
- All tests pass: `./gradlew :catalogs:catalog-lakehouse-generic:test
:lance:lance-common:test -PskipITs`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]