steFaiz opened a new pull request, #8191:
URL: https://github.com/apache/paimon/pull/8191

   ### Purpose
   Supports Blob Format in FormatTable.
   The situation is to replace ObjectStore by Paimon on DFS, unifying storage 
engines. Consider this situation:
   1. Users are trying to parse big videos, splitting into hundreds of images.
   2. This is always done by UDF, input is a video, output is a Json Map, 
contains <ImageIdentifier, ImageURL>, the results will be exported to 
structural storage e.g. ODPS
   3. Image splitting and upload is done within the UDF. Previously those 
images are uploaded to OSS. Now we can use paimon FormatTable to store them
   
   The key advantages are:
   1. Partition-level management: drop/overwrite partitions to manage blob 
lifecycle natively
   2. Drastically fewer files: N blobs packed into one file instead of N 
separate objects.
   3. BlobDescriptor output: each written blob returns a descriptor (path + 
offset + length) that downstream structured tables (e.g., ODPS) can consume via 
UDF for random access.
   
   #### Restriction
   Now we only permit one non-partition column Blob Format Table.
    
   ### Tests
   See `org.apache.paimon.table.format.FormatTableBlobTest` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to