lvyanquan opened a new pull request, #4391:
URL: https://github.com/apache/flink-cdc/pull/4391
#### Summary
This commit adds BLOB field support to the Flink CDC Paimon connector,
enabling efficient storage and handling of large binary data during CDC
synchronization operations.
#### Key Changes
1. New BlobWriteContext Component
- Introduced BlobWriteContext class to handle BLOB fields during CDC write
operations
- Supports two blob storage modes:
- Mode 1 (raw data): VARBINARY/BINARY fields → BlobData → written to
.blob files
- Mode 2 (descriptor): VARCHAR/STRING fields → BlobRef → only descriptor
(uri, offset, length) stored inline
- Integrates with Paimon's CoreOptions for blob configuration
2. Schema Evolution Support
- Enhanced SchemaChangeProvider to automatically convert
VARBINARY/BINARY/VARCHAR/STRING types to BLOB type based on blob-field
configuration
- Updated updateColumnType method to handle BLOB type conversion during
schema changes
- Added validation to prevent altering primary key or partition key
columns to BLOB type
3. Writer Integration
- Modified PaimonWriterHelper to support blob field handling
- Updated PaimonRecordEventSerializer for BLOB data serialization
- Enhanced TableSchemaInfo to track blob field metadata
4. Comprehensive Testing
- Added PaimonMetadataApplierTest with 468 lines of test coverage
- Added PaimonWriterHelperTest for blob write scenarios
- Added AppendOnlyTableITCase integration tests with test fixtures
#### Configuration Example
##### Enable blob fields via table options
blob-field = content, image_data
blob-descriptor-field = external_file_path
##### Enable blob fields via table options
blob-field = content, image_data
blob-descriptor-field = external_file_path
#### JIRA Reference
https://issues.apache.org/jira/browse/FLINK-39567
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]