> If the current model is considered deterministic, do you think
`default-catalog` and `default-namespace` fields provide enough determinism
to eliminate the need for UUIDs when storing table identifiers?

I am fine with storing UUIDs for table identifiers in the view. Basically,
view creation resolves all referenced tables/views with UUIDs. View
consumers can validate resolved tables/views with the stored UUIDs and fail
the query if mismatch.

The UUID change doesn't really change the table identifier resolution rule
though. It is more of a safety protection.

On Wed, May 7, 2025 at 10:02 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Hi Steven,
>
> Thanks for the reply.
>
> > I agree with Dan that we shouldn't solve catalog naming in the Iceberg
> view spec.
>
> To clarify, I don't believe the proposal is trying to solve catalog
> naming. What it’s doing is simply this:
>
> * Proposing that table names inside views resolve the same way as they do
> elsewhere (e.g., queries).
> * Adopting a model that is already widely used and supported in the
> existing ecosystem, which allows for:
>     -- Renaming catalog aliases
>     -- Swapping catalog implementations behind consistent names
>     -- Having different default catalog names across engines that still
> point to the same underlying tables
>
> These are common patterns in production data lakes. Saying Iceberg views
> cannot operate in those environments feels unrealistic. In practice, it
> means the spec breaks down in situations that users encounter regularly.
>
> > The recommendation of using engines’ current catalog and database can
> cause context-dependent resolution results.
>
> * As noted in the doc and earlier replies, fixing a catalog name doesn’t
> actually guarantee determinism either. All the failure scenarios above
> still apply even when a default-catalog is stored.
> * The current spec also allows default-catalog to be null, in which case
> it falls back to the view’s catalog, yet that catalog is determined based
> on how the view is referenced in the query, which would be considered
> non-deterministic based on the same criteria you shared.
> * The only true form of determinism here is UUID-based validation, which
> protects against silent drift in any resolution model.
>
> If the current model is considered deterministic, do you think
> `default-catalog` and `default-namespace` fields provide enough determinism
> to eliminate the need for UUIDs when storing table identifiers?
> Or put another way: Would you be comfortable relying solely on
> default-catalog + default-namespace + table name to re-identify the correct
> table, without UUID validation?
>
> +1 on involving other communities. I’m happy to help facilitate a
> cross-community discussion if we aren’t able to reach a resolution here.
>
> Thanks,
> Walaa.
>
>
>
> On Wed, May 7, 2025 at 9:20 PM Steven Wu <stevenz...@gmail.com> wrote:
>
>> I agree with Dan that we shouldn't solve catalog naming in the Iceberg
>> view spec. I am not convinced that the proposed change will make the table
>> identifier resolution more clear and portable. The recommendation of using
>> engines' current catalog and database can cause context dependent
>> resolution results, which seems non-deterministic to me.
>>
>> Walaa, you raised a point in the doc that the current catalog resolution
>> logic (default-catalog field, then view catalog) is challenging and
>> unrealistic for engines (like Spark and Trino). It will be great to get
>> more inputs from the broader community on this part.
>>
>>
>> On Tue, May 6, 2025 at 9:21 AM Benny Chow <btc...@gmail.com> wrote:
>>
>>> In Spark, I believe that the USE commands sets the current catalog and
>>> namespace.  This affects both where the view is created and how unqualified
>>> table identifiers are resolved.  I also don't see an issue with saving the
>>> current catalog and namespace into the view metadata's default-catalog and
>>> default-namespace fields.
>>>
>>> On Wed, Apr 30, 2025 at 5:12 PM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> > I think that's the lesser evil compared to Iceberg specifying how
>>>> engines should resolve identifiers
>>>>
>>>> I think this is also similar to the previous point. It is the other way
>>>> around. Right now the spec dictates how to resolve (through employing a
>>>> view-specific `default-catalog` field). The proposal is suggesting to get
>>>> out of this space and let engines handle it similar to how they handle all
>>>> identifiers.
>>>>
>>>> On Wed, Apr 30, 2025 at 5:07 PM Walaa Eldin Moustafa <
>>>> wa.moust...@gmail.com> wrote:
>>>>
>>>>> > I thought "default-catalog" could be set via the USE command.
>>>>>
>>>>> Benny, I think this is a misconception or miscommunication. The USE
>>>>> command has no impact on the `default-catalog` field. In fact, the
>>>>> proposal's direction is exactly to establish that USE command should
>>>>> influence how tables are resolved, same like everywhere else. Right now it
>>>>> is not the case under the current spec.
>>>>>
>>>>>
>>>>> On Wed, Apr 30, 2025 at 3:17 PM Benny Chow <btc...@gmail.com> wrote:
>>>>>
>>>>>> > there is no SQL construct today to explicitly set default-catalog
>>>>>>
>>>>>> I thought "default-catalog" could be set via the USE command.
>>>>>>
>>>>>> I generally agree with Dan about requiring consistent catalog names.
>>>>>> I think that's the lesser evil compared to Iceberg specifying how engines
>>>>>> should resolve identifiers.  Another thing to consider is that identifier
>>>>>> resolution can be very expensive at query validation time if identifiers
>>>>>> need to be looked up from a bunch of places.  Hopefully, it should be
>>>>>> possible to define a view in such a way that identifiers can be resolved 
>>>>>> on
>>>>>> the first try.
>>>>>>
>>>>>> Benny
>>>>>>
>>>>>> On Tue, Apr 29, 2025 at 10:29 PM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Rishabh,
>>>>>>>
>>>>>>> You're right that the proposal touches on two aspects, and
>>>>>>> resolution rules are one of them. The other aspect is the proposal's
>>>>>>> position that table identifiers should be stored in metadata exactly as
>>>>>>> they appear in the view text (e.g., even if they're two-part or 
>>>>>>> partially
>>>>>>> qualified), along with their corresponding UUIDs for validation. This
>>>>>>> applies both to referenced input tables and the storage table 
>>>>>>> identifier in
>>>>>>> materialized views.
>>>>>>>
>>>>>>> We may be able to converge on this storage format even if we haven't
>>>>>>> yet converged on the resolution fallback rules. I believe both 
>>>>>>> resolution
>>>>>>> strategies currently being discussed would still lead to storing
>>>>>>> identifiers in this way.
>>>>>>>
>>>>>>> I'm supportive of moving forward with consensus on the identifier
>>>>>>> storage format. That said, we may continue to run into questions 
>>>>>>> related to
>>>>>>> resolution during implementation. For example: Should the storage table
>>>>>>> identifier follow the same default-catalog and default-namespace 
>>>>>>> resolution
>>>>>>> behavior as other table references?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>> On Tue, Apr 29, 2025 at 10:07 PM Rishabh Bhatia <
>>>>>>> bhatiarishab...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Walaa,
>>>>>>>>
>>>>>>>> Thanks for starting this discussion.
>>>>>>>>
>>>>>>>> I think we should decouple at least the MV Spec from the proposal
>>>>>>>> to change the current behavior of view resolution.
>>>>>>>>
>>>>>>>> We can continue having the discussion if the current view spec
>>>>>>>> needs to be changed or not. Based on the decision at a later point if
>>>>>>>> required we can update the view resolution rule.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rishabh
>>>>>>>>
>>>>>>>> On Mon, Apr 28, 2025 at 3:22 PM Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Correction of typo: both engines seem to set default-catalog to
>>>>>>>>> the view catalog if it is defined, or to null if the view catalog is 
>>>>>>>>> not
>>>>>>>>> defined.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 28, 2025 at 3:06 PM Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dan,
>>>>>>>>>>
>>>>>>>>>> Thanks again for your response.
>>>>>>>>>>
>>>>>>>>>> I agree that catalog renaming is an environmental event, but it's
>>>>>>>>>> a real one that happens frequently in practice.
>>>>>>>>>> Saying that the Iceberg spec cannot accommodate something as
>>>>>>>>>> common as catalog renaming feels very restrictive, and could make 
>>>>>>>>>> the spec
>>>>>>>>>> less practical, even unusable, for real-world deployments.
>>>>>>>>>> I’m sharing this from the perspective of a large data lake
>>>>>>>>>> environment where views are heavily deployed and operationalized.
>>>>>>>>>>
>>>>>>>>>> Further, it's worth noting that the table spec is resilient to
>>>>>>>>>> catalog renaming, but the view spec is not. If we have an 
>>>>>>>>>> opportunity to
>>>>>>>>>> make the view spec similarly resilient, I wonder why not?
>>>>>>>>>> Both specifications are deterministic in their definition, but
>>>>>>>>>> one is more fragile to environmental changes than the other. 
>>>>>>>>>> Improving
>>>>>>>>>> resilience does not sacrifice determinism. It simply makes views 
>>>>>>>>>> safer and
>>>>>>>>>> more portable over time.
>>>>>>>>>>
>>>>>>>>>> Separately, given that there is no SQL construct today to
>>>>>>>>>> explicitly set default-catalog at creation time, what is the 
>>>>>>>>>> intuition
>>>>>>>>>> behind how engines like Spark and Trino currently assign 
>>>>>>>>>> default-catalog?
>>>>>>>>>> Today, both engines seem to set default-catalog to null if the
>>>>>>>>>> view catalog is defined, or to the view catalog if not.
>>>>>>>>>> What was the intended thought process behind this behavior?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Walaa
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 28, 2025 at 1:33 PM Daniel Weeks <dwe...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Walaa,
>>>>>>>>>>>
>>>>>>>>>>> > tables inside views remain reachable after a catalog rename
>>>>>>>>>>>
>>>>>>>>>>> This problem stems from the exact environmental/configuration
>>>>>>>>>>> issue that we should not be trying to address.  I don't think we 
>>>>>>>>>>> would
>>>>>>>>>>> expect references to survive a catalog rename.  That's not something
>>>>>>>>>>> covered by the spec and needs to be handled separately as a 
>>>>>>>>>>> platform-level
>>>>>>>>>>> migration specific to the affected environment.
>>>>>>>>>>>
>>>>>>>>>>> The identifier resolution logic is clear and deterministic.  It
>>>>>>>>>>> should not matter whether an engine resolves and encodes the
>>>>>>>>>>> default-catalog or leaves it to the resolution rules.
>>>>>>>>>>>
>>>>>>>>>>> The issue isn't with how the spec is defined, but rather view
>>>>>>>>>>> behavior when you start altering the environment around it, which 
>>>>>>>>>>> isn't
>>>>>>>>>>> something we should be trying to define here.
>>>>>>>>>>>
>>>>>>>>>>> -Dan
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 28, 2025 at 12:17 PM Walaa Eldin Moustafa <
>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for chiming in.
>>>>>>>>>>>>
>>>>>>>>>>>> I believe the issues we’re seeing now go beyond just catalog
>>>>>>>>>>>> naming consistency. The behavior around default-catalog itself 
>>>>>>>>>>>> introduces
>>>>>>>>>>>> resolution inconsistencies even when catalog names are consistent.
>>>>>>>>>>>> For example:
>>>>>>>>>>>>
>>>>>>>>>>>> * When default-catalog is set to null, tables inside views
>>>>>>>>>>>> remain reachable after a catalog rename. But if it is set to a 
>>>>>>>>>>>> non-null
>>>>>>>>>>>> value, table references will break.
>>>>>>>>>>>>
>>>>>>>>>>>> * default-catalog causes table references inside views to be
>>>>>>>>>>>> early bound (i.e., bound at view creation time, especially when 
>>>>>>>>>>>> using a
>>>>>>>>>>>> non-null value), while table references inside standalone queries 
>>>>>>>>>>>> are late
>>>>>>>>>>>> bound (bound at query time). This creates inconsistencies when 
>>>>>>>>>>>> resolving
>>>>>>>>>>>> the same table name inside and outside views, even within the same 
>>>>>>>>>>>> job.
>>>>>>>>>>>>
>>>>>>>>>>>> * It causes Spark's and Trino behavior to drift from the spec.
>>>>>>>>>>>> There is no way to fully align Spark's behavior without making 
>>>>>>>>>>>> invasive
>>>>>>>>>>>> changes to the Spark SQL grammar and the View DataSource API 
>>>>>>>>>>>> (specifically
>>>>>>>>>>>> on the CREATE side). This challenge would extend to other engines 
>>>>>>>>>>>> too. Both
>>>>>>>>>>>> Spark and Trino set this field based on a heuristic in today's
>>>>>>>>>>>> implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> * With view nesting (views depending on views), these
>>>>>>>>>>>> inconsistencies amplify further, forcing users and engines to 
>>>>>>>>>>>> reason about
>>>>>>>>>>>> catalog resolution at every level in the view tree.
>>>>>>>>>>>>
>>>>>>>>>>>> * It will be difficult to migrate Hive views to Iceberg with
>>>>>>>>>>>> that model. Migrated Hive views will have to unfollow that spec.
>>>>>>>>>>>>
>>>>>>>>>>>> How would you suggest approaching the engine-level changes
>>>>>>>>>>>> required to support the current default-catalog field?
>>>>>>>>>>>> Also, do you believe the Spark and Trino communities would
>>>>>>>>>>>> align around having table resolution behave inconsistently between 
>>>>>>>>>>>> queries
>>>>>>>>>>>> and views, or inconsistency between Iceberg and other types of 
>>>>>>>>>>>> views?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Walaa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Apr 28, 2025 at 11:34 AM Daniel Weeks <
>>>>>>>>>>>> dwe...@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I would agree with Jan's summary of why 'default-catalog' was
>>>>>>>>>>>>> introduced, but I think we need to step back and align on what we 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> really attempting to support in the spec.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issues we're discussing largely stem from using multiple
>>>>>>>>>>>>> engines with cross catalog references and configurations where 
>>>>>>>>>>>>> catalog
>>>>>>>>>>>>> names are not aligned.  If we have multiple engines that all have 
>>>>>>>>>>>>> the same
>>>>>>>>>>>>> catalog names/configurations, the current spec implementation is 
>>>>>>>>>>>>> well
>>>>>>>>>>>>> defined for table resolution even across catalogs.  The 
>>>>>>>>>>>>> 'default-catalog'
>>>>>>>>>>>>> (and namespace equivalent) was intended to address the resolution 
>>>>>>>>>>>>> within
>>>>>>>>>>>>> the context of the sql text, not to address catalog/naming 
>>>>>>>>>>>>> inconsistencies.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I feel like we're trying to adapt the original intent to
>>>>>>>>>>>>> address the catalog naming/configuration and would argue that we 
>>>>>>>>>>>>> shouldn't
>>>>>>>>>>>>> attempt to do that as part of the spec.  Inconsistently named 
>>>>>>>>>>>>> catalogs are
>>>>>>>>>>>>> a reality, but we should consider that a 
>>>>>>>>>>>>> configuration/environmental issue,
>>>>>>>>>>>>> not something to solve for in the spec.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should support and advocate for consistency in catalog
>>>>>>>>>>>>> naming and define the spec along those lines.  The fact is that 
>>>>>>>>>>>>> with all of
>>>>>>>>>>>>> the recent work that's gone into making catalogs pluggable, it 
>>>>>>>>>>>>> makes more
>>>>>>>>>>>>> sense to just register catalog configuration with consistent 
>>>>>>>>>>>>> names (even if
>>>>>>>>>>>>> you have to duplicate the configuration for supporting existing
>>>>>>>>>>>>> readers/writers).  I think it's better to provide a path toward 
>>>>>>>>>>>>> consistency
>>>>>>>>>>>>> than to normalize complicated schemes to workaround the issues 
>>>>>>>>>>>>> caused by
>>>>>>>>>>>>> environmental/configuration inconsistencies.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the goal is to create clever ways to hack the late binding
>>>>>>>>>>>>> resolution to swap in different catalogs or make references 
>>>>>>>>>>>>> contextual, I
>>>>>>>>>>>>> feel like that is something we should strongly discourage as it 
>>>>>>>>>>>>> leads to
>>>>>>>>>>>>> confusion about what is resolved as part of the query.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At this point, I don't see a good argument to add
>>>>>>>>>>>>> additional configuration or change the resolution behaviors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Apr 28, 2025 at 12:40 AM Jan Kaul
>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the intention with the "default-catalog" was that
>>>>>>>>>>>>>> every query engine uses it to store its session default catalog 
>>>>>>>>>>>>>> at the time
>>>>>>>>>>>>>> of creating the view. This way the view could be reused in 
>>>>>>>>>>>>>> another session.
>>>>>>>>>>>>>> The idea was not to introduce an additional SQL syntax to set the
>>>>>>>>>>>>>> default-catalog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Generally we have different environments we want to support
>>>>>>>>>>>>>> with the view spec:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Consistent catalog naming
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When the environment supports it, using consistent catalog
>>>>>>>>>>>>>> names can have a great benefit for multi-catalog, multi-engine 
>>>>>>>>>>>>>> setups. With
>>>>>>>>>>>>>> consistent catalog names, using the "default-catalog" field 
>>>>>>>>>>>>>> works without
>>>>>>>>>>>>>> any issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Inconsistent catalog naming
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This can be the case when different query engines refer to
>>>>>>>>>>>>>> the same physical catalog by different names. This often happens 
>>>>>>>>>>>>>> because
>>>>>>>>>>>>>> different query engines use different strategies to setup the 
>>>>>>>>>>>>>> catalogs. If
>>>>>>>>>>>>>> catalogs have inconsistent naming, using the "default-catalog" 
>>>>>>>>>>>>>> field does
>>>>>>>>>>>>>> not work because it is not guaranteed that the catalog name can 
>>>>>>>>>>>>>> be resolved
>>>>>>>>>>>>>> with another engine. Using the "view catalog" as a fallback is a 
>>>>>>>>>>>>>> better
>>>>>>>>>>>>>> solution for this use case, as it avoids catalog names 
>>>>>>>>>>>>>> altogether. It is
>>>>>>>>>>>>>> however limited to table references in the same catalog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think of introducing a view property that
>>>>>>>>>>>>>> specifies if the "default-catalog" or the "view catalog" should 
>>>>>>>>>>>>>> be used?
>>>>>>>>>>>>>> This way, you could use the "default-catalog" in environments 
>>>>>>>>>>>>>> where you can
>>>>>>>>>>>>>> guarantee consistent naming, but you would be able to directly 
>>>>>>>>>>>>>> fallback to
>>>>>>>>>>>>>> the "view-catalog" when you don't have consistent naming. The 
>>>>>>>>>>>>>> query engines
>>>>>>>>>>>>>> could set the default for this view property at creation time. 
>>>>>>>>>>>>>> Spark for
>>>>>>>>>>>>>> example could set it to automatically use the "view catalog".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 4/26/25 05:33, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To help folks catch up on the latest discussions and
>>>>>>>>>>>>>> interpretation of the spec, I have summarized everything we 
>>>>>>>>>>>>>> discussed so
>>>>>>>>>>>>>> far at the top of the proposal document (here
>>>>>>>>>>>>>> <https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0>).
>>>>>>>>>>>>>> I have slightly updated the proposal to be in sync with the new
>>>>>>>>>>>>>> interpretation to avoid confusion. In summary:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Remove default-catalog and default-namespace fields from
>>>>>>>>>>>>>> the view spec completely.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Hence, we do not attempt to define separate view-level
>>>>>>>>>>>>>> default catalogs or namespaces.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Instead:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * If a table identifier inside a view lacks a catalog
>>>>>>>>>>>>>> qualifier, engines should resolve it using the current engine 
>>>>>>>>>>>>>> catalog at
>>>>>>>>>>>>>> query time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Reference table identifiers in the metadata exactly as they
>>>>>>>>>>>>>> appear in the view SQL text.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * If an identifier lacks the catalog part at creation, it
>>>>>>>>>>>>>> should still lack a catalog in the stored metadata.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Store UUIDs alongside table identifiers whenever possible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:18 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the contribution Benny! +1 to the confusion the
>>>>>>>>>>>>>>> fallback creates. Also just to be clear, at this point and 
>>>>>>>>>>>>>>> after clarifying
>>>>>>>>>>>>>>> the current spec intentions, I am convinced that we should 
>>>>>>>>>>>>>>> remove the
>>>>>>>>>>>>>>> default catalog and default namespace fields altogether.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:13 PM Benny Chow <btc...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to contribute my opinions on this:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - I don't particularly like the current behavior of
>>>>>>>>>>>>>>>> "default to the view's catalog when default-catalog is not 
>>>>>>>>>>>>>>>> set".
>>>>>>>>>>>>>>>> Fundamentally, I believe the intent of default-catalog and
>>>>>>>>>>>>>>>> default-namespace is there to help users write more concise 
>>>>>>>>>>>>>>>> SQL.
>>>>>>>>>>>>>>>> - spark session catalog is engine specific and I don't
>>>>>>>>>>>>>>>> think we should design something that says first use this 
>>>>>>>>>>>>>>>> catalog, then
>>>>>>>>>>>>>>>> that catalog.. or that catalog.  For example, resolving 
>>>>>>>>>>>>>>>> identifiers using
>>>>>>>>>>>>>>>> default-catalog -> view's catalog -> session catalog is not 
>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>> - We gotta support non-Iceberg tables otherwise I see no
>>>>>>>>>>>>>>>> value in putting views in the catalog to share with other 
>>>>>>>>>>>>>>>> engines
>>>>>>>>>>>>>>>> - Interoperability between different engine types is very
>>>>>>>>>>>>>>>> hard due to dialect issues... so I think we should focus on 
>>>>>>>>>>>>>>>> supporting
>>>>>>>>>>>>>>>> different clusters of the same engine type on a shared 
>>>>>>>>>>>>>>>> catalog.  For
>>>>>>>>>>>>>>>> example, AI and BI clusters on Spark sharing the same views in 
>>>>>>>>>>>>>>>> a REST
>>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Coincidentally, I think the ultimate solution is along the
>>>>>>>>>>>>>>>> lines of something Russell proposed last year:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://lists.apache.org/thread/hoskfx8y3kvrcww52l4w9dxghp3pnlm7
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We've been looking at this interoperable identifier problem
>>>>>>>>>>>>>>>> through the lens of catalog resolution but maybe the right 
>>>>>>>>>>>>>>>> approach is
>>>>>>>>>>>>>>>> really about templating.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would extend Russell's idea to allow identifiers in a
>>>>>>>>>>>>>>>> view to span catalogs to support non-Iceberg tables.   Also, 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> default-catalog property could be templated as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>> Benny
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 4:02 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Steven! How do you recommend making Spark
>>>>>>>>>>>>>>>>> implementation conform to the spec? Do we need Spark SQL 
>>>>>>>>>>>>>>>>> extensions and/or
>>>>>>>>>>>>>>>>> Spark catalog APIs for that?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How do you recommend reconciling the inconsistencies I
>>>>>>>>>>>>>>>>> shared regarding many resolution methods not consistently 
>>>>>>>>>>>>>>>>> being followed in
>>>>>>>>>>>>>>>>> different scenarios (view vs child table resolution, query vs 
>>>>>>>>>>>>>>>>> view
>>>>>>>>>>>>>>>>> resolution)? Note these occur when the default catalog is set 
>>>>>>>>>>>>>>>>> to a non-null
>>>>>>>>>>>>>>>>> value. If it helps, I can share concrete examples.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 3:52 PM Steven Wu <
>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The core issue is on the fall back behavior when
>>>>>>>>>>>>>>>>>> `default-catalog` is
>>>>>>>>>>>>>>>>>> not defined. Current view spec says the fallback should
>>>>>>>>>>>>>>>>>> be the catalog
>>>>>>>>>>>>>>>>>> where the view is defined. It doesn't really matter what
>>>>>>>>>>>>>>>>>> the catalog
>>>>>>>>>>>>>>>>>> is named (catalogX) by the read engine.
>>>>>>>>>>>>>>>>>> - If a view refers to the tables in the same catalog,
>>>>>>>>>>>>>>>>>> this is a
>>>>>>>>>>>>>>>>>> non-ambiguous and reasonable fallback behavior.
>>>>>>>>>>>>>>>>>> - If a view refers to tables from another catalog,
>>>>>>>>>>>>>>>>>> catalog names
>>>>>>>>>>>>>>>>>> should be included in the reference name already. So no
>>>>>>>>>>>>>>>>>> ambiguity
>>>>>>>>>>>>>>>>>> there either.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Potential inconsistent naming of catalog is a separate
>>>>>>>>>>>>>>>>>> problem, which
>>>>>>>>>>>>>>>>>> Iceberg view spec probably cannot solve. We can only
>>>>>>>>>>>>>>>>>> recommend that
>>>>>>>>>>>>>>>>>> catalog should be named consistently across usage for
>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>> interoperability on name references.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This proposal is to change the fallback behavior to
>>>>>>>>>>>>>>>>>> engine's session
>>>>>>>>>>>>>>>>>> default catalog. I am not sure it is better than the
>>>>>>>>>>>>>>>>>> current fallback
>>>>>>>>>>>>>>>>>> behavior.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > Today’s Spark behavior explicitly differs from this
>>>>>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view creation 
>>>>>>>>>>>>>>>>>> using the
>>>>>>>>>>>>>>>>>> session’s default catalog, not a supplied `default-catalog`.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I would argue that is a Spark implementation issue for
>>>>>>>>>>>>>>>>>> not conforming
>>>>>>>>>>>>>>>>>> to the spec.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 1:17 PM Walaa Eldin Moustafa
>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Hi Jan,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks again for continuing the discussion. I want to
>>>>>>>>>>>>>>>>>> highlight a few fundamental issues around the interpretation 
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> default-catalog:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Here is the real catch:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > * default-catalog cannot logically be defined at view
>>>>>>>>>>>>>>>>>> creation time. It would be circular: the view needs to exist 
>>>>>>>>>>>>>>>>>> before its
>>>>>>>>>>>>>>>>>> metadata (and hence default-catalog) can exist. This is 
>>>>>>>>>>>>>>>>>> visible in Spark’s
>>>>>>>>>>>>>>>>>> implementation, where `default-catalog` is not used.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > * Introducing a creation-time default-catalog setting
>>>>>>>>>>>>>>>>>> would require extending SQL syntax and engine APIs to 
>>>>>>>>>>>>>>>>>> promote it to a
>>>>>>>>>>>>>>>>>> first-class view concept. This would be intrusive, 
>>>>>>>>>>>>>>>>>> non-intuitive, and
>>>>>>>>>>>>>>>>>> realistically very difficult to standardize across engines.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > * Today’s Spark behavior explicitly differs from this
>>>>>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view creation 
>>>>>>>>>>>>>>>>>> using the
>>>>>>>>>>>>>>>>>> session’s default catalog, not a supplied `default-catalog`.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > * Hypothetically even if we patched in a creation-time
>>>>>>>>>>>>>>>>>> default-catalog, it would create an inconsistent binding 
>>>>>>>>>>>>>>>>>> model between
>>>>>>>>>>>>>>>>>> tables vs views (early vs late), and between tables in views 
>>>>>>>>>>>>>>>>>> and in queries
>>>>>>>>>>>>>>>>>> (again early vs late). For example, views and tables in 
>>>>>>>>>>>>>>>>>> queries can
>>>>>>>>>>>>>>>>>> withstand default catalog renames, but tables cannot when 
>>>>>>>>>>>>>>>>>> they are used
>>>>>>>>>>>>>>>>>> inside views -- it even applies to views inside views, which 
>>>>>>>>>>>>>>>>>> makes this
>>>>>>>>>>>>>>>>>> very hard to reason about considering nesting.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>>>>> > Walaa
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > On Fri, Apr 25, 2025 at 7:00 AM Jan Kaul
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> @Walaa:
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> I would argue that when you run a CREATE VIEW
>>>>>>>>>>>>>>>>>> statement the query engine knowns which catalog the view is 
>>>>>>>>>>>>>>>>>> being created
>>>>>>>>>>>>>>>>>> in. So even though we typically use late binding to resolve 
>>>>>>>>>>>>>>>>>> the view
>>>>>>>>>>>>>>>>>> catalog at query time, it can also be used at creation time.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> The query engine would need to keep track of the "view
>>>>>>>>>>>>>>>>>> catalog" where the view is going to be created in. It can 
>>>>>>>>>>>>>>>>>> use that catalog
>>>>>>>>>>>>>>>>>> to resolve partial table identifiers if "default-catalog" is 
>>>>>>>>>>>>>>>>>> not set.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> It can lead to some unintuitive behavior, where
>>>>>>>>>>>>>>>>>> partial identifiers in the view query resolve to a different 
>>>>>>>>>>>>>>>>>> catalog
>>>>>>>>>>>>>>>>>> compared to using them outside of a view.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> CREATE VIEW catalogA.sales.monthly_orders AS SELECT *
>>>>>>>>>>>>>>>>>> from sales.orders;
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> If the session default catalog is not "catalogA", the
>>>>>>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as 
>>>>>>>>>>>>>>>>>> just referencing
>>>>>>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because 
>>>>>>>>>>>>>>>>>> without a
>>>>>>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would 
>>>>>>>>>>>>>>>>>> default to
>>>>>>>>>>>>>>>>>> "catalogA", which is the view's catalog.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Jan
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On 4/25/25 04:05, Manu Zhang wrote:
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> For example, if we want to validate that the tables
>>>>>>>>>>>>>>>>>> referenced in the view exist, how can we do that when 
>>>>>>>>>>>>>>>>>> default-catalog isn't
>>>>>>>>>>>>>>>>>> defined, since the view hasn't been created or loaded yet?
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> I don't think this is related to view spec. How do we
>>>>>>>>>>>>>>>>>> validate that a table exists without a default catalog, or 
>>>>>>>>>>>>>>>>>> do we always use
>>>>>>>>>>>>>>>>>> the current session catalog?
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>>>>> >> Manu
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> Hi Jan,
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> I think we still share the same understanding. Just
>>>>>>>>>>>>>>>>>> to clarify: when I referred to late binding as “similar” to 
>>>>>>>>>>>>>>>>>> the proposal, I
>>>>>>>>>>>>>>>>>> was acknowledging the distinction between view-level and 
>>>>>>>>>>>>>>>>>> table-level
>>>>>>>>>>>>>>>>>> resolution. But as you noted, both follow a late binding 
>>>>>>>>>>>>>>>>>> model.
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> That said, this still raises an interesting question
>>>>>>>>>>>>>>>>>> and a potential gap: if default-catalog is only defined at 
>>>>>>>>>>>>>>>>>> query time, how
>>>>>>>>>>>>>>>>>> should resolution work during view creation? For example, if 
>>>>>>>>>>>>>>>>>> we want to
>>>>>>>>>>>>>>>>>> validate that the tables referenced in the view exist, how 
>>>>>>>>>>>>>>>>>> can we do that
>>>>>>>>>>>>>>>>>> when default-catalog isn't defined, since the view hasn't 
>>>>>>>>>>>>>>>>>> been created or
>>>>>>>>>>>>>>>>>> loaded yet?
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>>>>>>>>> >>> Walaa.
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Yes, I have the same understanding. The view catalog
>>>>>>>>>>>>>>>>>> is resolved at query time.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> As you mentioned before, it's good to distinguish
>>>>>>>>>>>>>>>>>> between the physical catalog and it's reference used in SQL 
>>>>>>>>>>>>>>>>>> statements. The
>>>>>>>>>>>>>>>>>> important part is that the physical catalog of the view and 
>>>>>>>>>>>>>>>>>> the tables
>>>>>>>>>>>>>>>>>> referenced in it's definition stay consistent. You could 
>>>>>>>>>>>>>>>>>> create a view in a
>>>>>>>>>>>>>>>>>> given physical catalog by referring to it as "catalogA", as 
>>>>>>>>>>>>>>>>>> in your first
>>>>>>>>>>>>>>>>>> point. If you then, given a different setup, refer to the 
>>>>>>>>>>>>>>>>>> same physical
>>>>>>>>>>>>>>>>>> catalog as "catalogB" in another session/environment, the 
>>>>>>>>>>>>>>>>>> behavior should
>>>>>>>>>>>>>>>>>> still work.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> I would however rephrase your last point. Late
>>>>>>>>>>>>>>>>>> binding applies to the view catalog name and by extension to 
>>>>>>>>>>>>>>>>>> all partial
>>>>>>>>>>>>>>>>>> table references when no "default-catalog" is present. 
>>>>>>>>>>>>>>>>>> Resolving the view
>>>>>>>>>>>>>>>>>> catalog name at query time is not opposed to storing the 
>>>>>>>>>>>>>>>>>> view metadata in a
>>>>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Or maybe I don't entirely understand what you mean.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Jan
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> On 4/24/25 00:32, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Hi Jan,
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> > The view is executed when it's being referenced in
>>>>>>>>>>>>>>>>>> a SQL statement. That statement contains the information for 
>>>>>>>>>>>>>>>>>> the query
>>>>>>>>>>>>>>>>>> engine to resolve the catalog of the view.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> If I’m understanding correctly, that means:
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> * If the view is queried as SELECT * FROM
>>>>>>>>>>>>>>>>>> catalogA.namespace.view, then catalogA is considered the 
>>>>>>>>>>>>>>>>>> view’s catalog.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> * If the same view is later queried as SELECT * FROM
>>>>>>>>>>>>>>>>>> catalogB.namespace.view (after renaming catalogA to 
>>>>>>>>>>>>>>>>>> catalogB, and keeping
>>>>>>>>>>>>>>>>>> everything else the same), then catalogB becomes the view’s 
>>>>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Is that interpretation correct? If so, it sounds to
>>>>>>>>>>>>>>>>>> me like the catalog is resolved at query time, based on how 
>>>>>>>>>>>>>>>>>> the view is
>>>>>>>>>>>>>>>>>> referenced, not from any stored metadata. That would imply 
>>>>>>>>>>>>>>>>>> some sort of a
>>>>>>>>>>>>>>>>>> late binding behavior (similar to the proposal), as opposed 
>>>>>>>>>>>>>>>>>> to using some
>>>>>>>>>>>>>>>>>> catalog that "stores" the view definition.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>> Walaa
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Hi Walaa,
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Thanks for clarifying the aspects of
>>>>>>>>>>>>>>>>>> non-determinism. Let me try to address your questions.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> 1. This is my interpretation of the current spec:
>>>>>>>>>>>>>>>>>> The view is executed when it's being referenced in a SQL 
>>>>>>>>>>>>>>>>>> statement. That
>>>>>>>>>>>>>>>>>> statement contains the information for the query engine to 
>>>>>>>>>>>>>>>>>> resolve the
>>>>>>>>>>>>>>>>>> catalog of the view. The query engine then uses that 
>>>>>>>>>>>>>>>>>> information to fetch
>>>>>>>>>>>>>>>>>> the view metadata from the catalog. It also needs to 
>>>>>>>>>>>>>>>>>> temporarily keep track
>>>>>>>>>>>>>>>>>> of which catalog it used to fetch the view metadata. It can 
>>>>>>>>>>>>>>>>>> then use that
>>>>>>>>>>>>>>>>>> information to resolve the table references in the views SQL 
>>>>>>>>>>>>>>>>>> definition in
>>>>>>>>>>>>>>>>>> case no default catalog is specified.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> 2. The important part is that the catalog can be
>>>>>>>>>>>>>>>>>> referenced at execution time. As long as that's the case I 
>>>>>>>>>>>>>>>>>> would assume the
>>>>>>>>>>>>>>>>>> view can be created in any catalog.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> I think your point is really valuable because the
>>>>>>>>>>>>>>>>>> current specification can lead to some unintuitive behavior. 
>>>>>>>>>>>>>>>>>> For example
>>>>>>>>>>>>>>>>>> for the following statement:
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> CREATE VIEW catalogA.sales.monthly_orders AS SELECT
>>>>>>>>>>>>>>>>>> * from sales.orders;
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> If the session default catalog is not "catalogA",
>>>>>>>>>>>>>>>>>> the "sales.orders" in the view query would not be the same 
>>>>>>>>>>>>>>>>>> as just
>>>>>>>>>>>>>>>>>> referencing "sales.orders" in a normal SQL statement. This 
>>>>>>>>>>>>>>>>>> is because
>>>>>>>>>>>>>>>>>> without a "default-catalog", the catalog name of 
>>>>>>>>>>>>>>>>>> "sales.orders" would
>>>>>>>>>>>>>>>>>> default to "catalogA".
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> However, I like the current design of the view
>>>>>>>>>>>>>>>>>> spec, because it has the "closure" property. Because of the 
>>>>>>>>>>>>>>>>>> fact that the
>>>>>>>>>>>>>>>>>> "view catalog" has to be known when executing a view, all 
>>>>>>>>>>>>>>>>>> the information
>>>>>>>>>>>>>>>>>> required to resolve the table identifiers is contained in 
>>>>>>>>>>>>>>>>>> the view metadata
>>>>>>>>>>>>>>>>>> (and the "view catalog"). I think that if you make the 
>>>>>>>>>>>>>>>>>> identifier
>>>>>>>>>>>>>>>>>> resolution dependent on external parameters, it hinders 
>>>>>>>>>>>>>>>>>> portability.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Jan
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Hi Jan,
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Thanks for the thoughtful feedback.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> I think it’s important we clarify a key point
>>>>>>>>>>>>>>>>>> before going deeper:
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Non-determinism is not caused by session fallback
>>>>>>>>>>>>>>>>>> behavior—it’s a fundamental limitation of using table 
>>>>>>>>>>>>>>>>>> identifiers alone,
>>>>>>>>>>>>>>>>>> regardless of whether we use the current rule, the proposed 
>>>>>>>>>>>>>>>>>> fallback to the
>>>>>>>>>>>>>>>>>> session’s default catalog, or even early vs. late binding.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> The same fully qualified identifier (e.g.,
>>>>>>>>>>>>>>>>>> catalogA.namespace.table) can resolve to different objects 
>>>>>>>>>>>>>>>>>> depending solely
>>>>>>>>>>>>>>>>>> on engine-specific routing logic or catalog aliases. So 
>>>>>>>>>>>>>>>>>> determinism isn’t
>>>>>>>>>>>>>>>>>> guaranteed just because an identifier is "fully qualified." 
>>>>>>>>>>>>>>>>>> The only
>>>>>>>>>>>>>>>>>> reliable anchor for identity is the UUID. That’s why the 
>>>>>>>>>>>>>>>>>> proposed use of
>>>>>>>>>>>>>>>>>> UUIDs is not just a hardening strategy. It’s the actual fix 
>>>>>>>>>>>>>>>>>> for correctness.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> To move the conversation forward, could you help
>>>>>>>>>>>>>>>>>> clarify two things in the context of the current spec:
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> * Where in the metadata is the “view catalog”
>>>>>>>>>>>>>>>>>> stored, so that an engine knows to fall back to it if 
>>>>>>>>>>>>>>>>>> default-catalog is
>>>>>>>>>>>>>>>>>> null?
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> * Are we even allowed to create views in the
>>>>>>>>>>>>>>>>>> session's default catalog (i.e., without specifying a 
>>>>>>>>>>>>>>>>>> catalog) in the
>>>>>>>>>>>>>>>>>> current Iceberg spec?
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> These questions are important because if we can’t
>>>>>>>>>>>>>>>>>> unambiguously recover the "view catalog" from metadata, then 
>>>>>>>>>>>>>>>>>> defaulting to
>>>>>>>>>>>>>>>>>> it is problematic. And if views can't be created in the 
>>>>>>>>>>>>>>>>>> default catalog,
>>>>>>>>>>>>>>>>>> then the fallback rule doesn’t generalize.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>> Walaa.
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Hi Walaa,
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> thank you for your proposal. If I understood
>>>>>>>>>>>>>>>>>> correctly, you proposal is composed of three parts:
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> - session default catalog as fallback for
>>>>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> - session default namespace as fallback for
>>>>>>>>>>>>>>>>>> "default-namepace"
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> - Late binding + UUID validation
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> I have some comments regarding these points.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> 1. Session default catalog as fallback for
>>>>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Introducing a behavior that depends on the current
>>>>>>>>>>>>>>>>>> session setup is in my opinion the definition of 
>>>>>>>>>>>>>>>>>> "non-determinism". You
>>>>>>>>>>>>>>>>>> could be running the same query-engine and catalog-setup on 
>>>>>>>>>>>>>>>>>> different days,
>>>>>>>>>>>>>>>>>> with different default session catalogs (which is rather 
>>>>>>>>>>>>>>>>>> common), and would
>>>>>>>>>>>>>>>>>> be getting different results.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Whereas with the current behavior, the view always
>>>>>>>>>>>>>>>>>> produces the same results. The current behavior has some 
>>>>>>>>>>>>>>>>>> rough edges in
>>>>>>>>>>>>>>>>>> very niche use cases but I think is solid for most uses 
>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> 2. Session default namespace as fallback for
>>>>>>>>>>>>>>>>>> "default-namespace"
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Similar to the above.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> 3. Late binding + UUID validation
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> If I understand it correctly, the current
>>>>>>>>>>>>>>>>>> implementation already uses late binding.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Generally, having UUID validation makes the setup
>>>>>>>>>>>>>>>>>> more robust. Which is great. However, having UUID validation 
>>>>>>>>>>>>>>>>>> still requires
>>>>>>>>>>>>>>>>>> us to have a portable table identifier specification. Even 
>>>>>>>>>>>>>>>>>> if we have the
>>>>>>>>>>>>>>>>>> UUIDs of the referenced tables from the view, there simply 
>>>>>>>>>>>>>>>>>> isn't an
>>>>>>>>>>>>>>>>>> interface that let's us use those UUIDs. The catalog 
>>>>>>>>>>>>>>>>>> interface is defined
>>>>>>>>>>>>>>>>>> in terms of table identifiers.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> So we always require a working catalog setup and
>>>>>>>>>>>>>>>>>> suiting table identifiers to obtain the table metadata. We 
>>>>>>>>>>>>>>>>>> can use the
>>>>>>>>>>>>>>>>>> UUIDs to verify if we loaded the correct table. But this can 
>>>>>>>>>>>>>>>>>> only be done
>>>>>>>>>>>>>>>>>> after we used some identifier. Which means there is no way 
>>>>>>>>>>>>>>>>>> of using UUIDs
>>>>>>>>>>>>>>>>>> without a functioning catalog/identifier setup.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> In conclusion, I prefer the current behavior for
>>>>>>>>>>>>>>>>>> "default-catalog" because it is more deterministic in my 
>>>>>>>>>>>>>>>>>> opinion. And I
>>>>>>>>>>>>>>>>>> think the current spec does a good job for multi-engine 
>>>>>>>>>>>>>>>>>> table identifier
>>>>>>>>>>>>>>>>>> resolution. I see the UUID validation more of an additional 
>>>>>>>>>>>>>>>>>> hardening
>>>>>>>>>>>>>>>>>> strategy.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Thanks
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Jan
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Thanks Renjie!
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> The existing spec has some guidance on resolving
>>>>>>>>>>>>>>>>>> catalogs on the fly already (to address the case of view 
>>>>>>>>>>>>>>>>>> text with table
>>>>>>>>>>>>>>>>>> identifiers missing the catalog part). The guidance is to 
>>>>>>>>>>>>>>>>>> use the catalog
>>>>>>>>>>>>>>>>>> where the view is stored. But I find this rule hard to 
>>>>>>>>>>>>>>>>>> interpret or use.
>>>>>>>>>>>>>>>>>> The catalog itself is a logical construct—such as a 
>>>>>>>>>>>>>>>>>> federated catalog that
>>>>>>>>>>>>>>>>>> delegates to multiple physical backends (e.g., HMS and 
>>>>>>>>>>>>>>>>>> REST). In such
>>>>>>>>>>>>>>>>>> cases, the catalog (e.g., `my_catalog` in 
>>>>>>>>>>>>>>>>>> `my_catalog.namespace1.table1`)
>>>>>>>>>>>>>>>>>> doesn’t physically store the tables; it only routes requests 
>>>>>>>>>>>>>>>>>> to underlying
>>>>>>>>>>>>>>>>>> stores. Therefore, defaulting identifier resolution based on 
>>>>>>>>>>>>>>>>>> the catalog
>>>>>>>>>>>>>>>>>> where the view is "stored" doesn’t align with how catalogs 
>>>>>>>>>>>>>>>>>> actually behave
>>>>>>>>>>>>>>>>>> in practice.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>>> Walaa.
>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>> >>>>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu <
>>>>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>> Hi, Walaa:
>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>> Thanks for the proposal.
>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>> I've reviewed the doc, but in general I have some
>>>>>>>>>>>>>>>>>> concerns with resolving catalog names on the fly with query 
>>>>>>>>>>>>>>>>>> engine defined
>>>>>>>>>>>>>>>>>> catalog names. This introduces some flexibility at first 
>>>>>>>>>>>>>>>>>> glance, but also
>>>>>>>>>>>>>>>>>> makes misconfiguration difficult to explain.
>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>> But I agree with one part that we should store
>>>>>>>>>>>>>>>>>> resolved table uuid in view metadata, as table/view renaming 
>>>>>>>>>>>>>>>>>> may introduce
>>>>>>>>>>>>>>>>>> errors that's difficult to understand for user.
>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin
>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> Looking forward to keeping up the momentum and
>>>>>>>>>>>>>>>>>> closing out the MV spec as well. I’m hoping we can proceed 
>>>>>>>>>>>>>>>>>> to a vote next
>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> Here is a summary in case that helps. The
>>>>>>>>>>>>>>>>>> proposal outlines a strategy for handling table identifiers 
>>>>>>>>>>>>>>>>>> in Iceberg view
>>>>>>>>>>>>>>>>>> metadata, with the goal of ensuring correctness, 
>>>>>>>>>>>>>>>>>> portability, and engine
>>>>>>>>>>>>>>>>>> compatibility. It recommends resolving table identifiers at 
>>>>>>>>>>>>>>>>>> read time (late
>>>>>>>>>>>>>>>>>> binding) rather than creation time, and introduces 
>>>>>>>>>>>>>>>>>> UUID-based validation to
>>>>>>>>>>>>>>>>>> maintain identity guarantees across engines, or sessions. It 
>>>>>>>>>>>>>>>>>> also revises
>>>>>>>>>>>>>>>>>> how default-catalog and default-namespace are handled 
>>>>>>>>>>>>>>>>>> (defaulting both to
>>>>>>>>>>>>>>>>>> the session context if not explicitly set) to better align 
>>>>>>>>>>>>>>>>>> with engine
>>>>>>>>>>>>>>>>>> behavior and improve cross-engine interoperability.
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> Please let me know your thoughts.
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin
>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks Eduard and Sung! I have addressed the
>>>>>>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>> One key point to keep in mind is that catalog
>>>>>>>>>>>>>>>>>> names in the spec refer to logical catalogs—i.e., the first 
>>>>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>>>>> three-part identifier. These correspond to Spark's 
>>>>>>>>>>>>>>>>>> DataSourceV2 catalogs,
>>>>>>>>>>>>>>>>>> Trino connectors, and similar constructs. This is a level of 
>>>>>>>>>>>>>>>>>> abstraction
>>>>>>>>>>>>>>>>>> above physical catalogs, which are not referenced or used in 
>>>>>>>>>>>>>>>>>> the view spec.
>>>>>>>>>>>>>>>>>> The reason is that table identifiers in the view 
>>>>>>>>>>>>>>>>>> definition/text itself
>>>>>>>>>>>>>>>>>> refer to logical catalogs, not physical ones (since they 
>>>>>>>>>>>>>>>>>> interface directly
>>>>>>>>>>>>>>>>>> with the engine and not a specific metastore).
>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun <
>>>>>>>>>>>>>>>>>> sungwy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thank you Walaa for the proposal. I think view
>>>>>>>>>>>>>>>>>> portability is a very important topic for us to continue 
>>>>>>>>>>>>>>>>>> discussing as it
>>>>>>>>>>>>>>>>>> relies on many assumptions within the data ecosystem for it 
>>>>>>>>>>>>>>>>>> to function
>>>>>>>>>>>>>>>>>> like you've highlighted well in the document.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>> I've added a few comments around how this may
>>>>>>>>>>>>>>>>>> impact the permission questions the engines will be asking, 
>>>>>>>>>>>>>>>>>> and whether
>>>>>>>>>>>>>>>>>> that is the desired behavior.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>> Sung
>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard
>>>>>>>>>>>>>>>>>> Tudenhöfner <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks Walaa for tackling this problem. I've
>>>>>>>>>>>>>>>>>> added a few comments to get a better understanding of how 
>>>>>>>>>>>>>>>>>> this will look
>>>>>>>>>>>>>>>>>> like in the actual implementation.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Eduard
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin
>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Starting this thread to resume our
>>>>>>>>>>>>>>>>>> discussion on how to reference table identifiers from 
>>>>>>>>>>>>>>>>>> Iceberg metadata, a
>>>>>>>>>>>>>>>>>> key aspect of the view specification, particularly in 
>>>>>>>>>>>>>>>>>> relation to the MV
>>>>>>>>>>>>>>>>>> (materialized view) extensions.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I had the chance to speak offline with a few
>>>>>>>>>>>>>>>>>> community members to better understand how the current spec 
>>>>>>>>>>>>>>>>>> is being
>>>>>>>>>>>>>>>>>> interpreted. Those conversations served as inputs to a new 
>>>>>>>>>>>>>>>>>> proposal on how
>>>>>>>>>>>>>>>>>> table identifier references could be represented in metadata.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> You can find the proposal here [1]. I look
>>>>>>>>>>>>>>>>>> forward to your feedback and working together to move this 
>>>>>>>>>>>>>>>>>> forward so we
>>>>>>>>>>>>>>>>>> can finalize the MV spec as well.
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Reply via email to