JunRuiLee opened a new pull request, #7328: URL: https://github.com/apache/paimon/pull/7328
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose This PR adds support for descriptor-based BLOB fields that copy raw data to a configured external target directory at write time. For these fields, Paimon writes the raw BLOB bytes to the target directory and stores only serialized `BlobDescriptor`s inline in data files. The change also adds validation for the new copied-data descriptor options and verifies that raw-data BLOB fields, descriptor-based BLOB fields, and descriptor-based BLOB fields with copied raw data can coexist in the same table. This PR also refines `MERGE INTO` validation for BLOB columns in Flink and Spark. Updates are still rejected for raw-data BLOB columns, but are now allowed for descriptor-based BLOB columns, including those whose raw data is copied to an external target directory at write time. ### Tests UT: - `BlobTableTest#testCopiedDescriptorBlobField` - `BlobTableTest#testThreeTypeBlobCoexistence` - `BlobTableTest#testCopiedDescriptorFieldValidationRequiresTargetDir` - `BlobTableTest#testCopiedDescriptorFieldMustBeSubsetOfDescriptorField` - `BlobTestBase`: `Blob: merge-into rejects updating raw-data BLOB column` - `BlobTestBase`: `Blob: merge-into updates non-blob column on descriptor blob table` - `BlobTestBase`: `Blob: merge-into updates descriptor blob column with copied data end-to-end` IT: - `BlobTableITCase#testCopiedDescriptorBlob` - `BlobTableITCase#testThreeTypeBlobCoexistence` - `BlobTableITCase#testCopiedDescriptorBlobMultipleWrites` - `DataEvolutionMergeIntoActionITCase#testUpdateRawBlobColumnThrowsError` - `DataEvolutionMergeIntoActionITCase#testUpdateNonBlobColumnOnDescriptorBlobTableSucceeds` - `DataEvolutionMergeIntoActionITCase#testUpdateCopiedDescriptorBlobColumnSucceeds` ### API and Format <!-- Does this change affect API or storage format --> ### Documentation <!-- Does this change introduce a new feature --> ### Generative AI tooling <!-- If generative AI tooling has been used in the process of authoring this patch, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
