mbaurin opened a new pull request, #8257:
URL: https://github.com/apache/gravitino/pull/8257

   <!--
   1. Title: [#<issue>] <type>(<scope>): <subject>
      Examples:
        - "[#123] feat(operator): support xxx"
        - "[#233] fix: check null before access result in xxx"
        - "[MINOR] refactor: fix typo in variable name"
        - "[MINOR] docs: fix typo in README"
        - "[#255] test: fix flaky test NameOfTheTest"
      Reference: https://www.conventionalcommits.org/en/v1.0.0/
   2. If the PR is unfinished, please mark this PR as draft.
   -->
   
   ### What changes were proposed in this pull request?
   
   This PR implements SQL UPDATE/DELETE/MERGE support for Hive/Iceberg catalogs 
in the Gravitino Trino connector by:
   
   - **Core Infrastructure**: Added `GravitinoMergeTableHandle`, 
`GravitinoDeleteTableHandle`, and `GravitinoUpdateTableHandle` classes 
following the existing INSERT pattern
   - **Metadata Operations**: Implemented `beginMerge()`, `finishMerge()`, 
`getRowChangeParadigm()`, `getInsertLayout()`, and `getUpdateLayout()` methods 
in `GravitinoMetadata`
   - **Handle Management**: Enhanced `getMergeRowIdColumnHandle()` with proper 
handle wrapping and updated `JsonCodec` for `ConnectorMergeTableHandle` 
serialization
   - **Testing**: Added comprehensive unit tests 
(`GravitinoMergeTableHandleTest`) and integration tests 
(`00013_merge_operations.sql`) for MERGE functionality
   
   The implementation uses modern Trino SPI merge-based operations (Trino 435) 
rather than legacy delete/update methods, ensuring optimal performance across 
different catalog types.
   
   ### Why are the changes needed?
   
   Currently, the Gravitino Trino connector only supports basic INSERT 
operations. This limitation prevents users from performing essential data 
modification operations like:
   
   1. **Complex data updates** through MERGE statements with conditional logic
   2. **Efficient upsert operations** that combine INSERT/UPDATE in a single 
statement
   3. **Selective row deletion** based on business logic within MERGE operations
   4. **Modern data lake patterns** that require transactional UPDATE/DELETE 
capabilities
   
   This enhancement enables full DML (Data Manipulation Language) support, 
making Gravitino suitable for production data
   workflows that require comprehensive data modification capabilities beyond 
simple inserts.
   
   ### Does this PR introduce _any_ user-facing change?
   
   **Yes**, this PR introduces the following user-facing changes:
   
   1. **New SQL Operations Support**:
      ```sql
      -- MERGE operations with INSERT, UPDATE, DELETE clauses
      MERGE INTO target_table t
      USING source_table s ON (t.id = s.id)
      WHEN MATCHED AND s.status = 'active' THEN
        UPDATE SET name = s.name, updated_at = s.updated_at
      WHEN MATCHED AND s.status = 'inactive' THEN
        DELETE
      WHEN NOT MATCHED THEN
        INSERT (id, name, status) VALUES (s.id, s.name, s.status);
   
   2. Catalog-Specific Capabilities:
     - Iceberg catalogs: Full MERGE support with DELETE_ROW_AND_INSERT_ROW 
paradigm
     - Hive catalogs: Transactional ACID table support for MERGE operations
     - JDBC catalogs: Native UPDATE/DELETE capabilities wheresupported
   3. No Breaking Changes: All existing INSERT functionality remains unchanged 
and fully backward compatible
   
   ### How was this patch tested?
   
   1. Unit Testing:
     - Added GravitinoMergeTableHandleTest to validate handle creation, 
serialization, and basic functionality
     - Tests cover handle wrapping, JSON serialization/deserialization, and 
internal handle delegation
   2. Integration Testing:
     - Created comprehensive integration test 00013_merge_operations.sql for 
Iceberg catalogs
     - Tests include CREATE TABLE, INSERT, MERGE with UPDATE/DELETE/INSERT 
clauses, and result verification
     - Validates end-to-end MERGE functionality with realistic data scenarios
   3. Manual Testing:
   #### Compilation validation
   ./gradlew :trino-connector:trino-connector:compileJava
   #### Run integration tests
   ./gradlew :trino-connector:integration-test:test
   #### Code style validation
   ./gradlew spotlessCheck
   4. Testing Strategy:
     - Positive cases: Successful MERGE operations with various clause 
combinations
     - Edge cases: Empty source tables, no matching rows, all rows matching
     - Error handling: Proper exception handling for unsupported operations
     - Performance: Row change paradigm optimization per catalog type
   
   The implementation has been designed to work seamlessly with the existing 
Gravitino architecture and maintains full compatibility with current INSERT 
operations while adding comprehensive MERGE capabilities
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to