Daishuyuan commented on PR #4341: URL: https://github.com/apache/flink-cdc/pull/4341#issuecomment-4728537434
The current DWS SinkV2 implements an application-level two-phase commit with staging tables and the SinkV2 committer. The writer writes records into checkpoint-scoped staging tables and seals them into committables on checkpoints or schema barriers. The committer then merges the latest staged rows into the target table by primary key inside a DWS transaction and records a commit marker, making retries and recovery idempotent. For schema changes, the runtime now provides a FlushEvent-aware sink writer hook, so the DWS writer can commit data written with the old schema before the schema barrier is applied. The trade-off is that this is not native XA/database-level 2PC in DWS; it is an application-level 2PC. The upside is that it does not depend on native DWS distributed transaction support, fits Flink SinkV2's committer retry/recovery model, and handles schema barriers. The cost is that target tables need primary keys, extra staging and commit marker tables are required, schema flush performs a synchronous commit, and legacy native DWS client batching/retry options are kept only as compatibility options in SinkV2 rather than being used for write tuning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
