steFaiz commented on PR #8191: URL: https://github.com/apache/paimon/pull/8191#issuecomment-4677747795
> Why not just using Paimon table to store objects? Thanks! Let me explain this. My scenario is: * A Spark/Flink UDF takes images as input and immediately outputs a JSON Map<String, BlobDescriptor> — i.e. each image (blob) is written out and the UDF directly produces the descriptor (path + offset + length) for downstream (ODPS). Previously this is done by uploading each image to individual OSS files, I'm trying to replace OSS by directly Paimon on DFS Why append table is not suitable? * If use paimon, each UDF need to commit on `close()`. Each udf instance will commit once. For spark jobs, there may be hundreds of concurrent commits! Format table's commit is pretty lightweight. I'm exploring use Paimon Format Table to replace oss, just act as an archive for blobs. User's always refer to blobs by descriptor-only(not full scan) and can utilize paimon's blob packing, partition management and table management. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
