jerryshao opened a new pull request, #9678:
URL: https://github.com/apache/gravitino/pull/9678
### What changes were proposed in this pull request?
This commit implements support for external Delta Lake tables in the generic
lakehouse catalog, allowing users to register and manage metadata for existing
Delta tables.
Features:
- External Delta table registration and metadata management
- Schema stored from user CREATE TABLE request
- Metadata-only DROP operation (preserves physical data)
- Comprehensive validation and error messages
- Integration with Delta Kernel 3.3.0 for table creation
Implementation:
- DeltaConstants: Delta table format constant
- DeltaTableDelegator: ServiceLoader integration for Delta format
- DeltaTableOperations: Table lifecycle operations (242 lines)
* CREATE: Requires external=true and location properties
* LOAD: Retrieves metadata from entity store
* DROP: Removes metadata only, preserves data
* ALTER: Not supported (throws UnsupportedOperationException)
* PURGE: Not supported for external tables
Testing:
- 4 unit tests in TestDeltaTableOperations
- 7 integration tests in CatalogGenericCatalogDeltaIT
* Physical Delta table creation with Delta Kernel
* Registration and metadata operations
* Validation of external-only constraint
* Verification of metadata-only drop behavior
Documentation:
- lakehouse-generic-delta-table.md: Complete user guide
* Table operations and examples
* Data type mappings
* Troubleshooting and best practices
* Integration with Spark and Delta Lake APIs
Dependencies:
- Delta Kernel API 3.3.0 and defaults
- Hadoop 3.x for Configuration
Limitations:
- External tables only (managed tables require Delta 4.0 CommitCoordinator)
- ALTER not supported (use Delta Lake APIs directly)
- Schema validation not enforced (user responsibility)
- Partitioning informational only (managed by Delta log)
### Why are the changes needed?
Fix: #9647
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New UT and IT added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]