Hi Jingsong,

+1 on removing this mechanism.

I agree with the core concern: blob-external-storage-field / 
blob-external-storage-path write raw data to a location that lives entirely 
outside Paimon's management. Orphan-file cleanup doesn't reach that path, and 
the data isn't tied to snapshot lifecycle or expiration, so we lose the 
consistency and lifecycle guarantees that are the whole point of a managed lake 
format. Keeping a side channel of unmanaged files undermines that contract and 
is a long-term source of correctness and operability problems (leaked files, no 
GC, unclear ownership on table drop).

A couple of practical notes, not objections:

1. We should check whether anyone is currently relying on it. Since it's a 
fairly recent and narrow option, I'd lean toward a clean removal, but if 
there's any known usage a short deprecation window (warn first, remove next 
release) would be safer.
2. The underlying need — avoiding copying very large BLOBs into the table — is 
real. It's worth being explicit that the supported path for large blobs remains 
Paimon-managed BLOB storage, so users have a clear migration target.

Overall, +1 to remove. Thanks for raising it.

Best,
Nicholas Jiang

On 2026/06/29 02:42:31 Jingsong Li wrote:
> Hi everyone,
> 
> At present, the bloom external storage field mechanism places data in
> a completely unmanaged external environment, which contradicts
> Paimon's design. I consider removing this mechanism.
> 
> What do you think? I want to hear if there are any objections from the
> community.
> 
> Best,
> Jingsong
> 

Reply via email to