Hi Jingsong:

Thanks for raise this discussion!
`blob-external-storage` has been used in our internal cases for some time.
I'd like to share some experience from us.

The motivation is that:
* The basic mechanism "Write Blobs to somewhere, then store descriptors" is
suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is
crucial in video processing.
* For some super-large objects (like 500GB), we use `blob-external-storage`
mechanism and DFS lease to implement resumable transfer (Write blob to a
single File, can be resumed on retry.)
* For single dir, DFS has the restriction of about 200,000 files. Some
dataset are too big to store in a single DFS mount point. We extends
`blob-external-storage`
for multiple paths like `data-file.external-paths`

For management, for the datasets can be stored in a single DFS mount point,
we store external blobs in an internal path, like:
`table/partition/buket-0/blobs` so that those blob files can be deleted on
drop partition.
But more fine-grained management is hard to implement.

Overall, I think `blob-external-storage` can be removed. It's a somehow
temporary resolution for us.

On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]> wrote:

> Hi everyone,
>
> At present, the bloom external storage field mechanism places data in
> a completely unmanaged external environment, which contradicts
> Paimon's design. I consider removing this mechanism.
>
> What do you think? I want to hear if there are any objections from the
> community.
>
> Best,
> Jingsong
>

Reply via email to