Hi Jingsong, +1 on removing this mechanism.
I agree with the core concern: blob-external-storage-field / blob-external-storage-path write raw data to a location that lives entirely outside Paimon's management. Orphan-file cleanup doesn't reach that path, and the data isn't tied to snapshot lifecycle or expiration, so we lose the consistency and lifecycle guarantees that are the whole point of a managed lake format. Keeping a side channel of unmanaged files undermines that contract and is a long-term source of correctness and operability problems (leaked files, no GC, unclear ownership on table drop). A couple of practical notes, not objections: 1. We should check whether anyone is currently relying on it. Since it's a fairly recent and narrow option, I'd lean toward a clean removal, but if there's any known usage a short deprecation window (warn first, remove next release) would be safer. 2. The underlying need — avoiding copying very large BLOBs into the table — is real. It's worth being explicit that the supported path for large blobs remains Paimon-managed BLOB storage, so users have a clear migration target. Overall, +1 to remove. Thanks for raising it. Best, Nicholas Jiang On 2026/06/29 02:42:31 Jingsong Li wrote: > Hi everyone, > > At present, the bloom external storage field mechanism places data in > a completely unmanaged external environment, which contradicts > Paimon's design. I consider removing this mechanism. > > What do you think? I want to hear if there are any objections from the > community. > > Best, > Jingsong >
