mbaurin opened a new pull request, #8257:
URL: https://github.com/apache/gravitino/pull/8257
<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
Examples:
- "[#123] feat(operator): support xxx"
- "[#233] fix: check null before access result in xxx"
- "[MINOR] refactor: fix typo in variable name"
- "[MINOR] docs: fix typo in README"
- "[#255] test: fix flaky test NameOfTheTest"
Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. If the PR is unfinished, please mark this PR as draft.
-->
### What changes were proposed in this pull request?
This PR implements SQL UPDATE/DELETE/MERGE support for Hive/Iceberg catalogs
in the Gravitino Trino connector by:
- **Core Infrastructure**: Added `GravitinoMergeTableHandle`,
`GravitinoDeleteTableHandle`, and `GravitinoUpdateTableHandle` classes
following the existing INSERT pattern
- **Metadata Operations**: Implemented `beginMerge()`, `finishMerge()`,
`getRowChangeParadigm()`, `getInsertLayout()`, and `getUpdateLayout()` methods
in `GravitinoMetadata`
- **Handle Management**: Enhanced `getMergeRowIdColumnHandle()` with proper
handle wrapping and updated `JsonCodec` for `ConnectorMergeTableHandle`
serialization
- **Testing**: Added comprehensive unit tests
(`GravitinoMergeTableHandleTest`) and integration tests
(`00013_merge_operations.sql`) for MERGE functionality
The implementation uses modern Trino SPI merge-based operations (Trino 435)
rather than legacy delete/update methods, ensuring optimal performance across
different catalog types.
### Why are the changes needed?
Currently, the Gravitino Trino connector only supports basic INSERT
operations. This limitation prevents users from performing essential data
modification operations like:
1. **Complex data updates** through MERGE statements with conditional logic
2. **Efficient upsert operations** that combine INSERT/UPDATE in a single
statement
3. **Selective row deletion** based on business logic within MERGE operations
4. **Modern data lake patterns** that require transactional UPDATE/DELETE
capabilities
This enhancement enables full DML (Data Manipulation Language) support,
making Gravitino suitable for production data
workflows that require comprehensive data modification capabilities beyond
simple inserts.
### Does this PR introduce _any_ user-facing change?
**Yes**, this PR introduces the following user-facing changes:
1. **New SQL Operations Support**:
```sql
-- MERGE operations with INSERT, UPDATE, DELETE clauses
MERGE INTO target_table t
USING source_table s ON (t.id = s.id)
WHEN MATCHED AND s.status = 'active' THEN
UPDATE SET name = s.name, updated_at = s.updated_at
WHEN MATCHED AND s.status = 'inactive' THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, name, status) VALUES (s.id, s.name, s.status);
2. Catalog-Specific Capabilities:
- Iceberg catalogs: Full MERGE support with DELETE_ROW_AND_INSERT_ROW
paradigm
- Hive catalogs: Transactional ACID table support for MERGE operations
- JDBC catalogs: Native UPDATE/DELETE capabilities wheresupported
3. No Breaking Changes: All existing INSERT functionality remains unchanged
and fully backward compatible
### How was this patch tested?
1. Unit Testing:
- Added GravitinoMergeTableHandleTest to validate handle creation,
serialization, and basic functionality
- Tests cover handle wrapping, JSON serialization/deserialization, and
internal handle delegation
2. Integration Testing:
- Created comprehensive integration test 00013_merge_operations.sql for
Iceberg catalogs
- Tests include CREATE TABLE, INSERT, MERGE with UPDATE/DELETE/INSERT
clauses, and result verification
- Validates end-to-end MERGE functionality with realistic data scenarios
3. Manual Testing:
#### Compilation validation
./gradlew :trino-connector:trino-connector:compileJava
#### Run integration tests
./gradlew :trino-connector:integration-test:test
#### Code style validation
./gradlew spotlessCheck
4. Testing Strategy:
- Positive cases: Successful MERGE operations with various clause
combinations
- Edge cases: Empty source tables, no matching rows, all rows matching
- Error handling: Proper exception handling for unsupported operations
- Performance: Row change paradigm optimization per catalog type
The implementation has been designed to work seamlessly with the existing
Gravitino architecture and maintains full compatibility with current INSERT
operations while adding comprehensive MERGE capabilities
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]