Hi all,

PR is https://github.com/apache/paimon/pull/8378

Anyone who feels the need for discussion can provide feedback at any
time, and we can also revert if it is merged.

Best,
Jingsong

On Mon, Jun 29, 2026 at 11:29 AM Jingsong Li <[email protected]> wrote:
>
> And Nicholas and Junrui,
>
> Thank you for your positive feedback.
>
> Best,
> Jingsong
>
> On Mon, Jun 29, 2026 at 11:28 AM Jingsong Li <[email protected]> wrote:
> >
> > Hi Nicholas,
> >
> > For large Blobs, we can use 'data-file.external-paths', which is
> > managed and secure.
> >
> > Best,
> > Jingsong
> >
> > On Mon, Jun 29, 2026 at 11:24 AM Jingsong Li <[email protected]> wrote:
> > >
> > > Thanks wang for your feedback.
> > >
> > > Let's move forward together towards a more rational way of data 
> > > management.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Jun 29, 2026 at 11:15 AM Junrui Lee <[email protected]> wrote:
> > > >
> > > > Hi Jingsong,
> > > >
> > > > This option was originally introduced to support BLOB updates — at the 
> > > > time
> > > > we didn't have a managed way to update BLOB columns without rewriting 
> > > > the
> > > > raw bytes. That's now covered by the data-evolution placeholder 
> > > > mechanism,
> > > > so the original reason no longer holds. +1 from my side.
> > > >
> > > > Best,
> > > > Junrui
> > > >
> > > > wang <[email protected]> 于2026年6月29日周一 11:09写道:
> > > >
> > > > > Hi Jingsong:
> > > > >
> > > > > Thanks for raise this discussion!
> > > > > `blob-external-storage` has been used in our internal cases for some 
> > > > > time.
> > > > > I'd like to share some experience from us.
> > > > >
> > > > > The motivation is that:
> > > > > * The basic mechanism "Write Blobs to somewhere, then store 
> > > > > descriptors" is
> > > > > suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is
> > > > > crucial in video processing.
> > > > > * For some super-large objects (like 500GB), we use 
> > > > > `blob-external-storage`
> > > > > mechanism and DFS lease to implement resumable transfer (Write blob 
> > > > > to a
> > > > > single File, can be resumed on retry.)
> > > > > * For single dir, DFS has the restriction of about 200,000 files. Some
> > > > > dataset are too big to store in a single DFS mount point. We extends
> > > > > `blob-external-storage`
> > > > > for multiple paths like `data-file.external-paths`
> > > > >
> > > > > For management, for the datasets can be stored in a single DFS mount 
> > > > > point,
> > > > > we store external blobs in an internal path, like:
> > > > > `table/partition/buket-0/blobs` so that those blob files can be 
> > > > > deleted on
> > > > > drop partition.
> > > > > But more fine-grained management is hard to implement.
> > > > >
> > > > > Overall, I think `blob-external-storage` can be removed. It's a 
> > > > > somehow
> > > > > temporary resolution for us.
> > > > >
> > > > > On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > At present, the bloom external storage field mechanism places data 
> > > > > > in
> > > > > > a completely unmanaged external environment, which contradicts
> > > > > > Paimon's design. I consider removing this mechanism.
> > > > > >
> > > > > > What do you think? I want to hear if there are any objections from 
> > > > > > the
> > > > > > community.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > >

Reply via email to