Re: [DISCUSS] Table Identifiers in Iceberg View Spec

Jan Kaul Tue, 22 Apr 2025 00:14:35 -0700

Hi Walaa,

thank you for your proposal. If I understood correctly, you proposal iscomposed of three parts:


- session default catalog as fallback for "default-catalog"

- session default namespace as fallback for "default-namepace"

- Late binding + UUID validation

I have some comments regarding these points.


       1. Session default catalog as fallback for "default-catalog"

Introducing a behavior that depends on the current session setup is inmy opinion the definition of "non-determinism". You could be running thesame query-engine and catalog-setup on different days, with differentdefault session catalogs (which is rather common), and would be gettingdifferent results.

Whereas with the current behavior, the view always produces the sameresults. The current behavior has some rough edges in very niche usecases but I think is solid for most uses cases.



       2. Session default namespace as fallback for "default-namespace"

Similar to the above.


       3. Late binding + UUID validation

If I understand it correctly, the current implementation already useslate binding.

Generally, having UUID validation makes the setup more robust. Which isgreat. However, having UUID validation still requires us to have aportable table identifier specification. Even if we have the UUIDs ofthe referenced tables from the view, there simply isn't an interfacethat let's us use those UUIDs. The catalog interface is defined in termsof table identifiers.

So we always require a working catalog setup and suiting tableidentifiers to obtain the table metadata. We can use the UUIDs to verifyif we loaded the correct table. But this can only be done after we usedsome identifier. Which means there is no way of using UUIDs without afunctioning catalog/identifier setup.

In conclusion, I prefer the current behavior for "default-catalog"because it is more deterministic in my opinion. And I think the currentspec does a good job for multi-engine table identifier resolution. I seethe UUID validation more of an additional hardening strategy.


Thanks

Jan

On 4/21/25 17:38, Walaa Eldin Moustafa wrote:

Thanks Renjie!

The existing spec has some guidance on resolving catalogs on the flyalready (to address the case of view text with table identifiersmissing the catalog part). The guidance is to use the catalog wherethe view is stored. But I find this rule hard to interpret or use. Thecatalog itself is a logical construct—such as a federated catalog thatdelegates to multiple physical backends (e.g., HMS and REST). In suchcases, the catalog (e.g., `my_catalog` in`my_catalog.namespace1.table1`) doesn’t physically store the tables;it only routes requests to underlying stores. Therefore, defaultingidentifier resolution based on the catalog where the view is "stored"doesn’t align with how catalogs actually behave in practice.


Thanks,
Walaa.

On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu <[email protected]>wrote:

Hi, Walaa:

Thanks for the proposal.

I've reviewed the doc, but in general I have some concerns with
resolving catalog names on the fly with query engine defined
catalog names. This introduces some flexibility at first glance,
but also makes misconfiguration difficult to explain.

But I agree with one part that we should store resolved table uuid
in view metadata, as table/view renaming may introduce errors
that's difficult to understand for user.

On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa
<[email protected]> wrote:

Hi Everyone,

Looking forward to keeping up the momentum and closing out the
MV spec as well. I’m hoping we can proceed to a vote next week.

Here is a summary in case that helps. The proposal outlines a
strategy for handling table identifiers in Iceberg view
metadata, with the goal of ensuring correctness, portability,
and engine compatibility. It recommends resolving table
identifiers at read time (late binding) rather than creation
time, and introduces UUID-based validation to maintain
identity guarantees across engines, or sessions. It also
revises how default-catalog and default-namespace are handled
(defaulting both to the session context if not explicitly set)
to better align with engine behavior and improve cross-engine
interoperability.

Please let me know your thoughts.

Thanks,
Walaa.

On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin Moustafa
<[email protected]> wrote:

Thanks Eduard and Sung! I have addressed the comments.

One key point to keep in mind is that catalog names in the
spec refer to logical catalogs—i.e., the first part of a
three-part identifier. These correspond to Spark's
DataSourceV2 catalogs, Trino connectors, and similar
constructs. This is a level of abstraction above physical
catalogs, which are not referenced or used in the view
spec. The reason is that table identifiers in the view
definition/text itself refer to logical catalogs, not
physical ones (since they interface directly with the
engine and not a specific metastore).

Thanks,
Walaa.

On Wed, Apr 16, 2025 at 6:15 AM Sung Yun
<[email protected]> wrote:

Thank you Walaa for the proposal. I think view
portability is a very important topic for us to
continue discussing as it relies on many assumptions
within the data ecosystem for it to function like
you've highlighted well in the document.

I've added a few comments around how this may impact
the permission questions the engines will be asking,
and whether that is the desired behavior.

Sung

On Wed, Apr 16, 2025 at 7:32 AM Eduard Tudenhöfner
<[email protected]> wrote:

Thanks Walaa for tackling this problem. I've added
a few comments to get a better understanding of
how this will look like in the actual implementation.

Eduard

On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin
Moustafa <[email protected]> wrote:

Hi Everyone,

Starting this thread to resume our discussion
on how to reference table identifiers from
Iceberg metadata, a key aspect of the view
specification, particularly in relation to the
MV (materialized view) extensions.

I had the chance to speak offline with a few
community members to better understand how the
current spec is being interpreted. Those
conversations served as inputs to a new
proposal on how table identifier references
could be represented in metadata.

You can find the proposal here [1]. I look
forward to your feedback and working together
to move this forward so we can finalize the MV
spec as well.

[1]

https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0

Thanks,
Walaa.

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

Reply via email to