Hi Jingsong,

This option was originally introduced to support BLOB updates — at the time
we didn't have a managed way to update BLOB columns without rewriting the
raw bytes. That's now covered by the data-evolution placeholder mechanism,
so the original reason no longer holds. +1 from my side.

Best,
Junrui

wang <[email protected]> 于2026年6月29日周一 11:09写道:

> Hi Jingsong:
>
> Thanks for raise this discussion!
> `blob-external-storage` has been used in our internal cases for some time.
> I'd like to share some experience from us.
>
> The motivation is that:
> * The basic mechanism "Write Blobs to somewhere, then store descriptors" is
> suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is
> crucial in video processing.
> * For some super-large objects (like 500GB), we use `blob-external-storage`
> mechanism and DFS lease to implement resumable transfer (Write blob to a
> single File, can be resumed on retry.)
> * For single dir, DFS has the restriction of about 200,000 files. Some
> dataset are too big to store in a single DFS mount point. We extends
> `blob-external-storage`
> for multiple paths like `data-file.external-paths`
>
> For management, for the datasets can be stored in a single DFS mount point,
> we store external blobs in an internal path, like:
> `table/partition/buket-0/blobs` so that those blob files can be deleted on
> drop partition.
> But more fine-grained management is hard to implement.
>
> Overall, I think `blob-external-storage` can be removed. It's a somehow
> temporary resolution for us.
>
> On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]>
> wrote:
>
> > Hi everyone,
> >
> > At present, the bloom external storage field mechanism places data in
> > a completely unmanaged external environment, which contradicts
> > Paimon's design. I consider removing this mechanism.
> >
> > What do you think? I want to hear if there are any objections from the
> > community.
> >
> > Best,
> > Jingsong
> >
>

Reply via email to