Yes, I have the same understanding. The view catalog is resolved at query time.

As you mentioned before, it's good to distinguish between the physical catalog and it's reference used in SQL statements. The important part is that the physical catalog of the view and the tables referenced in it's definition stay consistent. You could create a view in a given physical catalog by referring to it as "catalogA", as in your first point. If you then, given a different setup, refer to the same physical catalog as "catalogB" in another session/environment, the behavior should still work.

I would however rephrase your last point. Late binding applies to the view catalog name and by extension to all partial table references when no "default-catalog" is present. Resolving the view catalog name at query time is not opposed to storing the view metadata in a catalog.

Or maybe I don't entirely understand what you mean.

Thanks

Jan

On 4/24/25 00:32, Walaa Eldin Moustafa wrote:
Hi Jan,

> The view is executed when it's being referenced in a SQL statement. That statement contains the information for the query engine to resolve the catalog of the view.

If I’m understanding correctly, that means:

* If the view is queried as SELECT * FROM catalogA.namespace.view, then catalogA is considered the view’s catalog.

* If the same view is later queried as SELECT * FROM catalogB.namespace.view (after renaming catalogA to catalogB, and keeping everything else the same), then catalogB becomes the view’s catalog.

Is that interpretation correct? If so, it sounds to me like the catalog is resolved at query time, based on how the view is referenced, not from any stored metadata. That would imply some sort of a late binding behavior (similar to the proposal), as opposed to using some catalog that "stores" the view definition.

Thanks,
Walaa

On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul <jank...@mailbox.org.invalid> wrote:

    Hi Walaa,

    Thanks for clarifying the aspects of non-determinism. Let me try
    to address your questions.

    1. This is my interpretation of the current spec: The view is
    executed when it's being referenced in a SQL statement. That
    statement contains the information for the query engine to resolve
    the catalog of the view. The query engine then uses that
    information to fetch the view metadata from the catalog. It also
    needs to temporarily keep track of which catalog it used to fetch
    the view metadata. It can then use that information to resolve the
    table references in the views SQL definition in case no default
    catalog is specified.

    2. The important part is that the catalog can be referenced at
    execution time. As long as that's the case I would assume the view
    can be created in any catalog.


    I think your point is really valuable because the current
    specification can lead to some unintuitive behavior. For example
    for the following statement:

    CREATE VIEW catalogA.sales.monthly_orders AS SELECT * from
    sales.orders;

    If the session default catalog is not "catalogA", the
    "sales.orders" in the view query would not be the same as just
    referencing "sales.orders" in a normal SQL statement. This is
    because without a "default-catalog", the catalog name of
    "sales.orders" would default to "catalogA".


    However, I like the current design of the view spec, because it
    has the "closure" property. Because of the fact that the "view
    catalog" has to be known when executing a view, all the
    information required to resolve the table identifiers is contained
    in the view metadata (and the "view catalog"). I think that if you
    make the identifier resolution dependent on external parameters,
    it hinders portability.

    Thanks,

    Jan

    On 4/22/25 18:36, Walaa Eldin Moustafa wrote:
    Hi Jan,

    Thanks for the thoughtful feedback.

    I think it’s important we clarify a key point before going deeper:

    Non-determinism is not caused by session fallback behavior—it’s a
    *fundamental limitation of using table identifiers* alone,
    regardless of whether we use the current rule, the proposed
    fallback to the session’s default catalog, or even early vs. late
    binding.

    The same fully qualified identifier (e.g.,
    catalogA.namespace.table) can resolve to different objects
    depending solely on engine-specific routing logic or catalog
    aliases. So determinism isn’t guaranteed just because an
    identifier is "fully qualified." The only reliable anchor for
    identity is the UUID. That’s why the proposed use of UUIDs is not
    just a hardening strategy. It’s the actual fix for correctness.

    To move the conversation forward, could you help clarify two
    things in the context of the current spec:

    * Where in the metadata is the “view catalog” stored, so that an
    engine knows to fall back to it if default-catalog is null?

    * Are we even allowed to create views in the session's default
    catalog (i.e., without specifying a catalog) in the current
    Iceberg spec?

    These questions are important because if we can’t unambiguously
    recover the "view catalog" from metadata, then defaulting to it
    is problematic. And if views can't be created in the default
    catalog, then the fallback rule doesn’t generalize.

    Thanks,
    Walaa.


    On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul
    <jank...@mailbox.org.invalid>
    <mailto:jank...@mailbox.org.invalid> wrote:

        Hi Walaa,

        thank you for your proposal. If I understood correctly, you
        proposal is composed of three parts:

        - session default catalog as fallback for "default-catalog"

        - session default namespace as fallback for "default-namepace"

        - Late binding + UUID validation

        I have some comments regarding these points.


                1. Session default catalog as fallback for
                "default-catalog"

        Introducing a behavior that depends on the current session
        setup is in my opinion the definition of "non-determinism".
        You could be running the same query-engine and catalog-setup
        on different days, with different default session catalogs
        (which is rather common), and would be getting different results.

        Whereas with the current behavior, the view always produces
        the same results. The current behavior has some rough edges
        in very niche use cases but I think is solid for most uses cases.


                2. Session default namespace as fallback for
                "default-namespace"

        Similar to the above.


                3. Late binding + UUID validation

        If I understand it correctly, the current implementation
        already uses late binding.

        Generally, having UUID validation makes the setup more
        robust. Which is great. However, having UUID validation still
        requires us to have a portable table identifier
        specification. Even if we have the UUIDs of the referenced
        tables from the view, there simply isn't an interface that
        let's us use those UUIDs. The catalog interface is defined in
        terms of table identifiers.

        So we always require a working catalog setup and suiting
        table identifiers to obtain the table metadata. We can use
        the UUIDs to verify if we loaded the correct table. But this
        can only be done after we used some identifier. Which means
        there is no way of using UUIDs without a functioning
        catalog/identifier setup.


        In conclusion, I prefer the current behavior for
        "default-catalog" because it is more deterministic in my
        opinion. And I think the current spec does a good job for
        multi-engine table identifier resolution. I see the UUID
        validation more of an additional hardening strategy.

        Thanks

        Jan

        On 4/21/25 17:38, Walaa Eldin Moustafa wrote:
        Thanks Renjie!

        The existing spec has some guidance on resolving catalogs on
        the fly already (to address the case of view text with table
        identifiers missing the catalog part). The guidance is to
        use the catalog where the view is stored. But I find this
        rule hard to interpret or use. The catalog itself is a
        logical construct—such as a federated catalog that delegates
        to multiple physical backends (e.g., HMS and REST). In such
        cases, the catalog (e.g., `my_catalog` in
        `my_catalog.namespace1.table1`) doesn’t physically store the
        tables; it only routes requests to underlying stores.
        Therefore, defaulting identifier resolution based on the
        catalog where the view is "stored" doesn’t align with how
        catalogs actually behave in practice.

        Thanks,
        Walaa.

        On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu
        <liurenjie2...@gmail.com> wrote:

            Hi, Walaa:

            Thanks for the proposal.

            I've reviewed the doc, but in general I have some
            concerns with resolving catalog names on the fly with
            query engine defined catalog names. This introduces some
            flexibility at first glance, but also makes
            misconfiguration difficult to explain.

            But I agree with one part that we should store resolved
            table uuid in view metadata, as table/view renaming may
            introduce errors that's difficult to understand for user.

            On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa
            <wa.moust...@gmail.com> wrote:

                Hi Everyone,

                Looking forward to keeping up the momentum and
                closing out the MV spec as well. I’m hoping we can
                proceed to a vote next week.

                Here is a summary in case that helps. The proposal
                outlines a strategy for handling table identifiers
                in Iceberg view metadata, with the goal of ensuring
                correctness, portability, and engine compatibility.
                It recommends resolving table identifiers at read
                time (late binding) rather than creation time, and
                introduces UUID-based validation to maintain
                identity guarantees across engines, or sessions. It
                also revises how default-catalog and
                default-namespace are handled (defaulting both to
                the session context if not explicitly set) to better
                align with engine behavior and improve cross-engine
                interoperability.

                Please let me know your thoughts.

                Thanks,
                Walaa.



                On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin Moustafa
                <wa.moust...@gmail.com> wrote:

                    Thanks Eduard and Sung! I have addressed the
                    comments.

                    One key point to keep in mind is that catalog
                    names in the spec refer to logical
                    catalogs—i.e., the first part of a three-part
                    identifier. These correspond to Spark's
                    DataSourceV2 catalogs, Trino connectors, and
                    similar constructs. This is a level of
                    abstraction above physical catalogs, which are
                    not referenced or used in the view spec. The
                    reason is that table identifiers in the view
                    definition/text itself refer to logical
                    catalogs, not physical ones (since they
                    interface directly with the engine and not a
                    specific metastore).

                    Thanks,
                    Walaa.


                    On Wed, Apr 16, 2025 at 6:15 AM Sung Yun
                    <sungwy...@gmail.com> wrote:

                        Thank you Walaa for the proposal. I think
                        view portability is a very important topic
                        for us to continue discussing as it relies
                        on many assumptions within the data
                        ecosystem for it to function like you've
                        highlighted well in the document.

                        I've added a few comments around how this
                        may impact the permission questions the
                        engines will be asking, and whether that is
                        the desired behavior.

                        Sung

                        On Wed, Apr 16, 2025 at 7:32 AM Eduard
                        Tudenhöfner <etudenhoef...@apache.org> wrote:

                            Thanks Walaa for tackling this problem.
                            I've added a few comments to get a
                            better understanding of how this will
                            look like in the actual implementation.

                            Eduard

                            On Tue, Apr 15, 2025 at 7:09 PM Walaa
                            Eldin Moustafa <wa.moust...@gmail.com>
                            wrote:

                                Hi Everyone,

                                Starting this thread to resume our
                                discussion on how to reference table
                                identifiers from Iceberg metadata, a
                                key aspect of the view
                                specification, particularly in
                                relation to the MV (materialized
                                view) extensions.

                                I had the chance to speak offline
                                with a few community members to
                                better understand how the current
                                spec is being interpreted. Those
                                conversations served as inputs to a
                                new proposal on how table identifier
                                references could be represented in
                                metadata.

                                You can find the proposal here [1].
                                I look forward to your feedback and
                                working together to move this
                                forward so we can finalize the MV
                                spec as well.

                                [1]
                                
https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0

                                Thanks,
                                Walaa.

Reply via email to