Re: [PR] [core] FormatTable supports Blob Format [paimon]

via GitHub Wed, 10 Jun 2026 23:17:54 -0700


steFaiz commented on PR #8191:
URL: https://github.com/apache/paimon/pull/8191#issuecomment-4677747795


   > Why not just using Paimon table to store objects?
   
   Thanks! Let me explain this. My scenario is:
   * A Spark/Flink UDF takes images as input and immediately outputs a JSON 
Map<String, BlobDescriptor> — i.e. each image (blob) is written out and the UDF 
directly produces the descriptor (path + offset + length) for downstream 
(ODPS). Previously this is done by uploading each image to individual OSS 
files, I'm trying to replace OSS by directly Paimon on DFS
   
   Why append table is not suitable?
   * If use paimon, each UDF need to commit on `close()`. Each udf instance 
will commit once. For spark jobs, there may be hundreds of concurrent commits! 
Format table's commit is pretty lightweight.
   
   I'm exploring use Paimon Format Table to replace oss, just act as an archive 
for blobs. User's always refer to blobs by descriptor-only(not full scan) and 
can utilize paimon's blob packing, partition management and table management. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] FormatTable supports Blob Format [paimon]

Reply via email to