platinumhamburg opened a new issue, #2133: URL: https://github.com/apache/fluss/issues/2133
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar. ### Motivation When using Aggregation Merge Engine with multiple concurrent Flink jobs writing to the same primary key table, the following issues occur: 1. **Concurrent Conflicts**: Multiple writers updating the same aggregation columns simultaneously cause data inconsistency 2. **No Exactly-Once Guarantee**: Job failover cannot guarantee aggregation accuracy 3. **Duplicate Calculations**: Data replay after job restart leads to duplicate aggregation 4. **State Loss**: Lack of state management to track committed data These issues make Aggregation Merge Engine unsuitable for production scenarios requiring strict data accuracy. ### Solution The complete solution has been proposed in the FIP document [FIP-21: Aggregation Merge Engine](https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine). ### 1. Column Lock Mechanism - **Client-side**: `ClientColumnLockManager` manages lock acquisition, renewal, and release - **Server-side**: `ServerColumnLockService` maintains table-level column lock resources - **Purpose**: Ensures exclusive access to columns, preventing concurrent modification conflicts - **Granularity**: Table-level locks with column-level control ### 2. State Management - **WriterState**: Tracks maximum committed offset per TableBucket - **BucketOffsetTracker**: Client-side offset tracking - **Integration**: Uses Flink's Operator State for persistence across checkpoints ### 3. Undo Recovery - **Mechanism**: On failover, undoes uncommitted data written after last checkpoint - **Implementation**: Reads old values at committed offset and overwrites new values - **Guarantee**: Ensures data consistency after job restart ### 4. Recovery Mode - **Purpose**: Special connection mode for undo operations - **Behavior**: Uses overwrite semantics instead of aggregation during recovery - **Usage**: Automatically enabled during undo recovery process ### Anything else? _No response_ ### Willingness to contribute - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
