And Nicholas and Junrui,

Thank you for your positive feedback.

Best,
Jingsong

On Mon, Jun 29, 2026 at 11:28 AM Jingsong Li <[email protected]> wrote:
>
> Hi Nicholas,
>
> For large Blobs, we can use 'data-file.external-paths', which is
> managed and secure.
>
> Best,
> Jingsong
>
> On Mon, Jun 29, 2026 at 11:24 AM Jingsong Li <[email protected]> wrote:
> >
> > Thanks wang for your feedback.
> >
> > Let's move forward together towards a more rational way of data management.
> >
> > Best,
> > Jingsong
> >
> > On Mon, Jun 29, 2026 at 11:15 AM Junrui Lee <[email protected]> wrote:
> > >
> > > Hi Jingsong,
> > >
> > > This option was originally introduced to support BLOB updates — at the 
> > > time
> > > we didn't have a managed way to update BLOB columns without rewriting the
> > > raw bytes. That's now covered by the data-evolution placeholder mechanism,
> > > so the original reason no longer holds. +1 from my side.
> > >
> > > Best,
> > > Junrui
> > >
> > > wang <[email protected]> 于2026年6月29日周一 11:09写道:
> > >
> > > > Hi Jingsong:
> > > >
> > > > Thanks for raise this discussion!
> > > > `blob-external-storage` has been used in our internal cases for some 
> > > > time.
> > > > I'd like to share some experience from us.
> > > >
> > > > The motivation is that:
> > > > * The basic mechanism "Write Blobs to somewhere, then store 
> > > > descriptors" is
> > > > suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is
> > > > crucial in video processing.
> > > > * For some super-large objects (like 500GB), we use 
> > > > `blob-external-storage`
> > > > mechanism and DFS lease to implement resumable transfer (Write blob to a
> > > > single File, can be resumed on retry.)
> > > > * For single dir, DFS has the restriction of about 200,000 files. Some
> > > > dataset are too big to store in a single DFS mount point. We extends
> > > > `blob-external-storage`
> > > > for multiple paths like `data-file.external-paths`
> > > >
> > > > For management, for the datasets can be stored in a single DFS mount 
> > > > point,
> > > > we store external blobs in an internal path, like:
> > > > `table/partition/buket-0/blobs` so that those blob files can be deleted 
> > > > on
> > > > drop partition.
> > > > But more fine-grained management is hard to implement.
> > > >
> > > > Overall, I think `blob-external-storage` can be removed. It's a somehow
> > > > temporary resolution for us.
> > > >
> > > > On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > At present, the bloom external storage field mechanism places data in
> > > > > a completely unmanaged external environment, which contradicts
> > > > > Paimon's design. I consider removing this mechanism.
> > > > >
> > > > > What do you think? I want to hear if there are any objections from the
> > > > > community.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > >

Reply via email to