Hi Talat,

That is a great idea.

As I mentioned in my comments on the document, there has been an ongoing
discussion regarding this in Apache Polaris. You can find more details in
this PR (https://github.com/apache/polaris/pull/4613) and the related
documentation (
https://github.com/jbonofre/polaris/blob/12dfea48570d076d4012143e66f02e8b503c4f99/site/content/in-dev/unreleased/directories.md
).

I am curious about where unstructured data support should be scoped. While
Iceberg might be the right place, I wonder if the catalog—or even a third
party—is more natural for managing credential vending and object access
indirection.

Regards,
JB


On Fri, Jun 26, 2026 at 2:52 AM Talat Uyarer via dev <[email protected]>
wrote:

> Hi everyone,
>
> I’d like to open a discussion on a new proposal to better support
> unstructured data in Iceberg.
>
> As tables increasingly need to reference unstructured objects (images,
> video, ML artifacts, PDFs) that are too large to embed, the current
> fallback is to use bare string URI columns. This has a few structural
> problems: it bypasses catalog governance (requiring engines to hold broad
> bucket-level credentials), lacks cross-engine portability, and breaks read
> determinism if the underlying object is overwritten.
>
> To solve this,  There is already an active proposal in the Parquet
> community to introduce a native File logical type for physical files.  I've
> drafted a proposal for a FileRef type (struct<path, etag>) which is
> designed to layer directly on top of that work. While Parquet defines the
> physical columnar representation, Iceberg's FileRef handles the
> table-format layer (governance, read determinism, snapshot isolation, and
> access brokering). A physical File column in Parquet will map 1:1 to
> Iceberg's logical FileRef, ensuring a unified standard from the storage
> layer up to the catalog.
>
> The core idea is to shift the responsibility of access control to the
> Iceberg REST Catalog. Instead of granting compute engines direct bucket
> access, the proposal introduces a new object-access endpoint. The catalog
> brokers access by vending short-lived credentials or pre-signed URLs
> strictly for the referenced objects (validated against a new
> fileref.allowed-locations table property).
>
> You can read the full proposal draft here:
> https://s.apache.org/iceberg-fileref
>
> I would love to get your feedback on this approach.
>
> Parquet Proposal:
> https://docs.google.com/document/d/1AiwrstqkwkBoOZqgOkm9JGwSMcNeHyLR7EEj1CVqpZQ/edit?tab=t.0#heading=h.k8qyue4jj4rn
>
> Best,
> Talat Uyarer
>

Reply via email to