Hi Jingsong: Thanks for raise this discussion! `blob-external-storage` has been used in our internal cases for some time. I'd like to share some experience from us.
The motivation is that: * The basic mechanism "Write Blobs to somewhere, then store descriptors" is suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is crucial in video processing. * For some super-large objects (like 500GB), we use `blob-external-storage` mechanism and DFS lease to implement resumable transfer (Write blob to a single File, can be resumed on retry.) * For single dir, DFS has the restriction of about 200,000 files. Some dataset are too big to store in a single DFS mount point. We extends `blob-external-storage` for multiple paths like `data-file.external-paths` For management, for the datasets can be stored in a single DFS mount point, we store external blobs in an internal path, like: `table/partition/buket-0/blobs` so that those blob files can be deleted on drop partition. But more fine-grained management is hard to implement. Overall, I think `blob-external-storage` can be removed. It's a somehow temporary resolution for us. On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li <[email protected]> wrote: > Hi everyone, > > At present, the bloom external storage field mechanism places data in > a completely unmanaged external environment, which contradicts > Paimon's design. I consider removing this mechanism. > > What do you think? I want to hear if there are any objections from the > community. > > Best, > Jingsong >
