leaves12138 opened a new pull request, #7602:
URL: https://github.com/apache/paimon/pull/7602

   ## Purpose
   
   This PR introduces `BLOB_REF` for sharing blob data across tables without 
duplicating payloads in Paimon-managed storage.
   
   ## Changes
   
   - add the `BLOB_REF` type and wire it through API, format, Arrow, Flink, 
Spark and Hive type conversions
   - serialize `BLOB_REF` values as `BlobReference` metadata instead of inline 
blob payloads
   - resolve blob references lazily on read, preferring direct URI reads and 
falling back to metadata lookup by table/row/field
   - keep the fallback path streaming instead of buffering the whole blob into 
memory
   - add `fieldId` to blob references for better schema evolution compatibility 
during fallback lookup
   - avoid dereferencing blob payloads in `InternalRowToSizeVisitor`
   - explicitly reject nested `BLOB_REF` in schema validation, since read-time 
resolution currently only supports top-level `BLOB_REF`
   - add unit tests for blob reference serialization, fallback streaming, size 
estimation, schema validation and fallback lookup
   
   ## Testing
   
   Passed:
   - `mvn -pl paimon-common -am -DfailIfNoTests=false -Dcheckstyle.skip 
-Dspotless.check.skip -Denforcer.skip 
-Dtest=BlobReferenceTest,BlobReferenceBlobTest,InternalRowToSizeVisitorTest 
test`
   
   Attempted but blocked by unrelated existing compile errors in 
`paimon-core/src/main/java/org/apache/paimon/rest/RESTCatalog.java` 
(`StringUtils` import/usage):
   - `mvn -pl paimon-core -am -DfailIfNoTests=false -Dcheckstyle.skip 
-Dspotless.check.skip -Denforcer.skip 
-Dtest=BlobRefSchemaValidationTest,BlobReferenceLookupTest test`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to