Hi Jingsong, This option was originally introduced to support BLOB updates — at the time we didn't have a managed way to update BLOB columns without rewriting the raw bytes. That's now covered by the data-evolution placeholder mechanism, so the original reason no longer holds. +1 from my side.
Best, Junrui wang <[email protected]> 于2026年6月29日周一 11:09写道: > Hi Jingsong: > > Thanks for raise this discussion! > `blob-external-storage` has been used in our internal cases for some time. > I'd like to share some experience from us. > > The motivation is that: > * The basic mechanism "Write Blobs to somewhere, then store descriptors" is > suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is > crucial in video processing. > * For some super-large objects (like 500GB), we use `blob-external-storage` > mechanism and DFS lease to implement resumable transfer (Write blob to a > single File, can be resumed on retry.) > * For single dir, DFS has the restriction of about 200,000 files. Some > dataset are too big to store in a single DFS mount point. We extends > `blob-external-storage` > for multiple paths like `data-file.external-paths` > > For management, for the datasets can be stored in a single DFS mount point, > we store external blobs in an internal path, like: > `table/partition/buket-0/blobs` so that those blob files can be deleted on > drop partition. > But more fine-grained management is hard to implement. > > Overall, I think `blob-external-storage` can be removed. It's a somehow > temporary resolution for us. > > On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]> > wrote: > > > Hi everyone, > > > > At present, the bloom external storage field mechanism places data in > > a completely unmanaged external environment, which contradicts > > Paimon's design. I consider removing this mechanism. > > > > What do you think? I want to hear if there are any objections from the > > community. > > > > Best, > > Jingsong > > >
