+1 on removing this mechanism. It is hard to manage, and user could use 'data-file.external-paths' instead.
On Mon, Jun 29, 2026 at 12:03 PM Jingsong Li <[email protected]> wrote: > > Hi all, > > PR is https://github.com/apache/paimon/pull/8378 > > Anyone who feels the need for discussion can provide feedback at any > time, and we can also revert if it is merged. > > Best, > Jingsong > > On Mon, Jun 29, 2026 at 11:29 AM Jingsong Li <[email protected]> wrote: > > > > And Nicholas and Junrui, > > > > Thank you for your positive feedback. > > > > Best, > > Jingsong > > > > On Mon, Jun 29, 2026 at 11:28 AM Jingsong Li <[email protected]> wrote: > > > > > > Hi Nicholas, > > > > > > For large Blobs, we can use 'data-file.external-paths', which is > > > managed and secure. > > > > > > Best, > > > Jingsong > > > > > > On Mon, Jun 29, 2026 at 11:24 AM Jingsong Li <[email protected]> > > > wrote: > > > > > > > > Thanks wang for your feedback. > > > > > > > > Let's move forward together towards a more rational way of data > > > > management. > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Jun 29, 2026 at 11:15 AM Junrui Lee <[email protected]> wrote: > > > > > > > > > > Hi Jingsong, > > > > > > > > > > This option was originally introduced to support BLOB updates — at > > > > > the time > > > > > we didn't have a managed way to update BLOB columns without rewriting > > > > > the > > > > > raw bytes. That's now covered by the data-evolution placeholder > > > > > mechanism, > > > > > so the original reason no longer holds. +1 from my side. > > > > > > > > > > Best, > > > > > Junrui > > > > > > > > > > wang <[email protected]> 于2026年6月29日周一 11:09写道: > > > > > > > > > > > Hi Jingsong: > > > > > > > > > > > > Thanks for raise this discussion! > > > > > > `blob-external-storage` has been used in our internal cases for > > > > > > some time. > > > > > > I'd like to share some experience from us. > > > > > > > > > > > > The motivation is that: > > > > > > * The basic mechanism "Write Blobs to somewhere, then store > > > > > > descriptors" is > > > > > > suitable for nested blobs. e.g. Array<Blob>, Map<Any, Blob> which is > > > > > > crucial in video processing. > > > > > > * For some super-large objects (like 500GB), we use > > > > > > `blob-external-storage` > > > > > > mechanism and DFS lease to implement resumable transfer (Write blob > > > > > > to a > > > > > > single File, can be resumed on retry.) > > > > > > * For single dir, DFS has the restriction of about 200,000 files. > > > > > > Some > > > > > > dataset are too big to store in a single DFS mount point. We extends > > > > > > `blob-external-storage` > > > > > > for multiple paths like `data-file.external-paths` > > > > > > > > > > > > For management, for the datasets can be stored in a single DFS > > > > > > mount point, > > > > > > we store external blobs in an internal path, like: > > > > > > `table/partition/buket-0/blobs` so that those blob files can be > > > > > > deleted on > > > > > > drop partition. > > > > > > But more fine-grained management is hard to implement. > > > > > > > > > > > > Overall, I think `blob-external-storage` can be removed. It's a > > > > > > somehow > > > > > > temporary resolution for us. > > > > > > > > > > > > On Mon, Jun 29, 2026 at 10:43 AM Jingsong Li > > > > > > <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > At present, the bloom external storage field mechanism places > > > > > > > data in > > > > > > > a completely unmanaged external environment, which contradicts > > > > > > > Paimon's design. I consider removing this mechanism. > > > > > > > > > > > > > > What do you think? I want to hear if there are any objections > > > > > > > from the > > > > > > > community. > > > > > > > > > > > > > > Best, > > > > > > > Jingsong > > > > > > > > > > > > >
