leaves12138 opened a new pull request, #7602: URL: https://github.com/apache/paimon/pull/7602
## Purpose This PR introduces `BLOB_REF` for sharing blob data across tables without duplicating payloads in Paimon-managed storage. ## Changes - add the `BLOB_REF` type and wire it through API, format, Arrow, Flink, Spark and Hive type conversions - serialize `BLOB_REF` values as `BlobReference` metadata instead of inline blob payloads - resolve blob references lazily on read, preferring direct URI reads and falling back to metadata lookup by table/row/field - keep the fallback path streaming instead of buffering the whole blob into memory - add `fieldId` to blob references for better schema evolution compatibility during fallback lookup - avoid dereferencing blob payloads in `InternalRowToSizeVisitor` - explicitly reject nested `BLOB_REF` in schema validation, since read-time resolution currently only supports top-level `BLOB_REF` - add unit tests for blob reference serialization, fallback streaming, size estimation, schema validation and fallback lookup ## Testing Passed: - `mvn -pl paimon-common -am -DfailIfNoTests=false -Dcheckstyle.skip -Dspotless.check.skip -Denforcer.skip -Dtest=BlobReferenceTest,BlobReferenceBlobTest,InternalRowToSizeVisitorTest test` Attempted but blocked by unrelated existing compile errors in `paimon-core/src/main/java/org/apache/paimon/rest/RESTCatalog.java` (`StringUtils` import/usage): - `mvn -pl paimon-core -am -DfailIfNoTests=false -Dcheckstyle.skip -Dspotless.check.skip -Denforcer.skip -Dtest=BlobRefSchemaValidationTest,BlobReferenceLookupTest test` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
