liumingjian opened a new pull request, #4195:
URL: https://github.com/apache/flink-cdc/pull/4195

   ## Summary
   This PR adds a complete GaussDB CDC connector implementation with support 
for snapshot and streaming modes using GaussDB's mppdb_decoding logical 
replication plugin.
   
   ## Key Features
   - ✅ Full snapshot support for initial data capture
   - ✅ Streaming CDC using GaussDB logical replication (mppdb_decoding)
   - ✅ Support for all common GaussDB data types
   - ✅ Configurable connection pooling and retry mechanisms
   - ✅ Comprehensive test suite
   
   ## Critical Bug Fixes
   
   ### 1. Default Value Converter Issue
   - **Problem**: GaussDB returns function calls (e.g., `pg_systimestamp()`, 
`CURRENT_TIMESTAMP`) as default values, causing Debezium to fail when trying to 
use them as actual values
   - **Solution**: Created `GaussDBDefaultValueConverter` that properly handles 
function-based defaults by returning `Optional.empty()` for them
   - **Impact**: Fixes schema building errors that prevented connector 
initialization
   
   ### 2. Missing Source Info Fields
   - **Problem**: Debezium envelope requires multiple fields (version, 
connector, name, snapshot) in the source struct, but they were not being set, 
causing validation errors
   - **Solution**: Added all required source info fields to both snapshot and 
streaming source struct builders
   - **Impact**: Fixes runtime errors during data capture
   
   ## Implementation Details
   
   ### Core Components
   - **GaussDBSource**: Main source implementation extending IncrementalSource
   - **GaussDBDialect**: Dialect for GaussDB-specific SQL and behavior
   - **GaussDBConnection**: Connection management with retry logic
   - **GaussDBReplicationConnection**: Logical replication connection handling
   - **GaussDBScanFetchTask**: Snapshot data reading with JDBC
   - **GaussDBStreamFetchTask**: Streaming CDC data reading via logical 
replication
   
   ### Configuration Options
   - Hostname, port, database, username, password
   - Plugin name (mppdb_decoding)
   - Slot name for logical replication
   - Connection timeout and retry settings
   - Snapshot fetch size
   - Table include/exclude patterns
   
   ### Testing
   - ✅ Unit tests for all major components
   - ✅ Integration tests for snapshot and streaming modes
   - ✅ Data type compatibility tests
   - ✅ Boundary condition tests
   - ⚠️ One integration test has timeout issue (under investigation)
   
   ## Verified Configuration
   - ✅ GaussDB `wal_level = logical` (required for CDC)
   - ✅ `mppdb_decoding` plugin available and functional
   - ✅ Replication slot creation and management working
   
   ## Known Issues
   - Integration test `testReadSingleTableAllRecords` times out (fetch task 
execution issue under investigation)
   - This appears to be a Flink job initialization issue rather than a data 
reading problem
   - All schema/envelope validation errors have been resolved
   
   ## Test Plan
   - [x] Unit tests pass
   - [x] Code formatting (spotless) passes
   - [ ] Integration test investigation ongoing
   - [ ] Manual testing with real GaussDB instance successful
   
   ## Dependencies
   - GaussDB JDBC driver (included in lib/)
   - Debezium PostgreSQL connector (for replication protocol compatibility)
   - Flink CDC base framework
   
   ## Documentation
   - README with usage examples
   - Docker Compose setup for local testing
   - Troubleshooting guide
   - Connectivity diagnosis guide
   
   ## Checklist
   - [x] Code follows project style guidelines
   - [x] Added comprehensive tests
   - [x] Added documentation
   - [x] Fixed critical bugs (default value converter, source info fields)
   - [ ] All tests passing (1 integration test timeout under investigation)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   
   Co-Authored-By: Claude Sonnet 4.5 <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to