platinumhamburg opened a new issue, #2133:
URL: https://github.com/apache/fluss/issues/2133

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Motivation
   
   When using Aggregation Merge Engine with multiple concurrent Flink jobs 
writing to the same primary key table, the following issues occur:
   
   1. **Concurrent Conflicts**: Multiple writers updating the same aggregation 
columns simultaneously cause data inconsistency
   2. **No Exactly-Once Guarantee**: Job failover cannot guarantee aggregation 
accuracy
   3. **Duplicate Calculations**: Data replay after job restart leads to 
duplicate aggregation
   4. **State Loss**: Lack of state management to track committed data
   
   These issues make Aggregation Merge Engine unsuitable for production 
scenarios requiring strict data accuracy.
   
   ### Solution
   
   The complete solution has been proposed in the FIP document [FIP-21: 
Aggregation Merge 
Engine](https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine).
   
   ### 1. Column Lock Mechanism
   - **Client-side**: `ClientColumnLockManager` manages lock acquisition, 
renewal, and release
   - **Server-side**: `ServerColumnLockService` maintains table-level column 
lock resources
   - **Purpose**: Ensures exclusive access to columns, preventing concurrent 
modification conflicts
   - **Granularity**: Table-level locks with column-level control
   
   ### 2. State Management
   - **WriterState**: Tracks maximum committed offset per TableBucket
   - **BucketOffsetTracker**: Client-side offset tracking
   - **Integration**: Uses Flink's Operator State for persistence across 
checkpoints
   
   ### 3. Undo Recovery
   - **Mechanism**: On failover, undoes uncommitted data written after last 
checkpoint
   - **Implementation**: Reads old values at committed offset and overwrites 
new values
   - **Guarantee**: Ensures data consistency after job restart
   
   ### 4. Recovery Mode
   - **Purpose**: Special connection mode for undo operations
   - **Behavior**: Uses overwrite semantics instead of aggregation during 
recovery
   - **Usage**: Automatically enabled during undo recovery process
   
   ### Anything else?
   
   _No response_
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to