steFaiz opened a new pull request, #8191:
URL: https://github.com/apache/paimon/pull/8191
### Purpose
Supports Blob Format in FormatTable.
The situation is to replace ObjectStore by Paimon on DFS, unifying storage
engines. Consider this situation:
1. Users are trying to parse big videos, splitting into hundreds of images.
2. This is always done by UDF, input is a video, output is a Json Map,
contains <ImageIdentifier, ImageURL>, the results will be exported to
structural storage e.g. ODPS
3. Image splitting and upload is done within the UDF. Previously those
images are uploaded to OSS. Now we can use paimon FormatTable to store them
The key advantages are:
1. Partition-level management: drop/overwrite partitions to manage blob
lifecycle natively
2. Drastically fewer files: N blobs packed into one file instead of N
separate objects.
3. BlobDescriptor output: each written blob returns a descriptor (path +
offset + length) that downstream structured tables (e.g., ODPS) can consume via
UDF for random access.
#### Restriction
Now we only permit one non-partition column Blob Format Table.
### Tests
See `org.apache.paimon.table.format.FormatTableBlobTest`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]