Re: [DISCUSS] Table Identifiers in Iceberg View Spec

Daniel Weeks Thu, 08 May 2025 08:47:40 -0700

I don't think we want to include the resolved table UUIDs in the view
definition, but rather in the storage table state.  You can still resolve
whether those drift at some point, but I don't feel like it's a good idea
to capture data in the view that we may allow to drift if there isn't any
requirement that they match.  This also aligns with the identifier
resolution being late binding.


-Dan

On Wed, May 7, 2025 at 10:45 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Thanks Steven! So would you agree that resolution using default-catalog
> and default-namespace does not provide full determinism, and requires a
> supporting safety mechanism?
>
> Thanks,
> Walaa.
>
> On Wed, May 7, 2025 at 10:30 PM Steven Wu <stevenz...@gmail.com> wrote:
>
>> > If the current model is considered deterministic, do you think
>> `default-catalog` and `default-namespace` fields provide enough determinism
>> to eliminate the need for UUIDs when storing table identifiers?
>>
>> I am fine with storing UUIDs for table identifiers in the view.
>> Basically, view creation resolves all referenced tables/views with UUIDs.
>> View consumers can validate resolved tables/views with the stored UUIDs and
>> fail the query if mismatch.
>>
>> The UUID change doesn't really change the table identifier resolution
>> rule though. It is more of a safety protection.
>>
>> On Wed, May 7, 2025 at 10:02 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Hi Steven,
>>>
>>> Thanks for the reply.
>>>
>>> > I agree with Dan that we shouldn't solve catalog naming in the Iceberg
>>> view spec.
>>>
>>> To clarify, I don't believe the proposal is trying to solve catalog
>>> naming. What it’s doing is simply this:
>>>
>>> * Proposing that table names inside views resolve the same way as they
>>> do elsewhere (e.g., queries).
>>> * Adopting a model that is already widely used and supported in the
>>> existing ecosystem, which allows for:
>>>     -- Renaming catalog aliases
>>>     -- Swapping catalog implementations behind consistent names
>>>     -- Having different default catalog names across engines that still
>>> point to the same underlying tables
>>>
>>> These are common patterns in production data lakes. Saying Iceberg views
>>> cannot operate in those environments feels unrealistic. In practice, it
>>> means the spec breaks down in situations that users encounter regularly.
>>>
>>> > The recommendation of using engines’ current catalog and database can
>>> cause context-dependent resolution results.
>>>
>>> * As noted in the doc and earlier replies, fixing a catalog name doesn’t
>>> actually guarantee determinism either. All the failure scenarios above
>>> still apply even when a default-catalog is stored.
>>> * The current spec also allows default-catalog to be null, in which case
>>> it falls back to the view’s catalog, yet that catalog is determined based
>>> on how the view is referenced in the query, which would be considered
>>> non-deterministic based on the same criteria you shared.
>>> * The only true form of determinism here is UUID-based validation, which
>>> protects against silent drift in any resolution model.
>>>
>>> If the current model is considered deterministic, do you think
>>> `default-catalog` and `default-namespace` fields provide enough determinism
>>> to eliminate the need for UUIDs when storing table identifiers?
>>> Or put another way: Would you be comfortable relying solely on
>>> default-catalog + default-namespace + table name to re-identify the correct
>>> table, without UUID validation?
>>>
>>> +1 on involving other communities. I’m happy to help facilitate a
>>> cross-community discussion if we aren’t able to reach a resolution here.
>>>
>>> Thanks,
>>> Walaa.
>>>
>>>
>>>
>>> On Wed, May 7, 2025 at 9:20 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> I agree with Dan that we shouldn't solve catalog naming in the Iceberg
>>>> view spec. I am not convinced that the proposed change will make the table
>>>> identifier resolution more clear and portable. The recommendation of using
>>>> engines' current catalog and database can cause context dependent
>>>> resolution results, which seems non-deterministic to me.
>>>>
>>>> Walaa, you raised a point in the doc that the current catalog
>>>> resolution logic (default-catalog field, then view catalog) is challenging
>>>> and unrealistic for engines (like Spark and Trino). It will be great to get
>>>> more inputs from the broader community on this part.
>>>>
>>>>
>>>> On Tue, May 6, 2025 at 9:21 AM Benny Chow <btc...@gmail.com> wrote:
>>>>
>>>>> In Spark, I believe that the USE commands sets the current catalog and
>>>>> namespace.  This affects both where the view is created and how 
>>>>> unqualified
>>>>> table identifiers are resolved.  I also don't see an issue with saving the
>>>>> current catalog and namespace into the view metadata's default-catalog and
>>>>> default-namespace fields.
>>>>>
>>>>> On Wed, Apr 30, 2025 at 5:12 PM Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> > I think that's the lesser evil compared to Iceberg specifying how
>>>>>> engines should resolve identifiers
>>>>>>
>>>>>> I think this is also similar to the previous point. It is the other
>>>>>> way around. Right now the spec dictates how to resolve (through 
>>>>>> employing a
>>>>>> view-specific `default-catalog` field). The proposal is suggesting to get
>>>>>> out of this space and let engines handle it similar to how they handle 
>>>>>> all
>>>>>> identifiers.
>>>>>>
>>>>>> On Wed, Apr 30, 2025 at 5:07 PM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> > I thought "default-catalog" could be set via the USE command.
>>>>>>>
>>>>>>> Benny, I think this is a misconception or miscommunication. The USE
>>>>>>> command has no impact on the `default-catalog` field. In fact, the
>>>>>>> proposal's direction is exactly to establish that USE command should
>>>>>>> influence how tables are resolved, same like everywhere else. Right now 
>>>>>>> it
>>>>>>> is not the case under the current spec.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 30, 2025 at 3:17 PM Benny Chow <btc...@gmail.com> wrote:
>>>>>>>
>>>>>>>> > there is no SQL construct today to explicitly set default-catalog
>>>>>>>>
>>>>>>>> I thought "default-catalog" could be set via the USE command.
>>>>>>>>
>>>>>>>> I generally agree with Dan about requiring consistent catalog
>>>>>>>> names.  I think that's the lesser evil compared to Iceberg specifying 
>>>>>>>> how
>>>>>>>> engines should resolve identifiers.  Another thing to consider is that
>>>>>>>> identifier resolution can be very expensive at query validation time if
>>>>>>>> identifiers need to be looked up from a bunch of places.  Hopefully, it
>>>>>>>> should be possible to define a view in such a way that identifiers can 
>>>>>>>> be
>>>>>>>> resolved on the first try.
>>>>>>>>
>>>>>>>> Benny
>>>>>>>>
>>>>>>>> On Tue, Apr 29, 2025 at 10:29 PM Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Rishabh,
>>>>>>>>>
>>>>>>>>> You're right that the proposal touches on two aspects, and
>>>>>>>>> resolution rules are one of them. The other aspect is the proposal's
>>>>>>>>> position that table identifiers should be stored in metadata exactly 
>>>>>>>>> as
>>>>>>>>> they appear in the view text (e.g., even if they're two-part or 
>>>>>>>>> partially
>>>>>>>>> qualified), along with their corresponding UUIDs for validation. This
>>>>>>>>> applies both to referenced input tables and the storage table 
>>>>>>>>> identifier in
>>>>>>>>> materialized views.
>>>>>>>>>
>>>>>>>>> We may be able to converge on this storage format even if we
>>>>>>>>> haven't yet converged on the resolution fallback rules. I believe both
>>>>>>>>> resolution strategies currently being discussed would still lead to 
>>>>>>>>> storing
>>>>>>>>> identifiers in this way.
>>>>>>>>>
>>>>>>>>> I'm supportive of moving forward with consensus on the identifier
>>>>>>>>> storage format. That said, we may continue to run into questions 
>>>>>>>>> related to
>>>>>>>>> resolution during implementation. For example: Should the storage 
>>>>>>>>> table
>>>>>>>>> identifier follow the same default-catalog and default-namespace 
>>>>>>>>> resolution
>>>>>>>>> behavior as other table references?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Walaa.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 29, 2025 at 10:07 PM Rishabh Bhatia <
>>>>>>>>> bhatiarishab...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Walaa,
>>>>>>>>>>
>>>>>>>>>> Thanks for starting this discussion.
>>>>>>>>>>
>>>>>>>>>> I think we should decouple at least the MV Spec from the proposal
>>>>>>>>>> to change the current behavior of view resolution.
>>>>>>>>>>
>>>>>>>>>> We can continue having the discussion if the current view spec
>>>>>>>>>> needs to be changed or not. Based on the decision at a later point if
>>>>>>>>>> required we can update the view resolution rule.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rishabh
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 28, 2025 at 3:22 PM Walaa Eldin Moustafa <
>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Correction of typo: both engines seem to set default-catalog to
>>>>>>>>>>> the view catalog if it is defined, or to null if the view catalog 
>>>>>>>>>>> is not
>>>>>>>>>>> defined.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 28, 2025 at 3:06 PM Walaa Eldin Moustafa <
>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again for your response.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree that catalog renaming is an environmental event, but
>>>>>>>>>>>> it's a real one that happens frequently in practice.
>>>>>>>>>>>> Saying that the Iceberg spec cannot accommodate something as
>>>>>>>>>>>> common as catalog renaming feels very restrictive, and could make 
>>>>>>>>>>>> the spec
>>>>>>>>>>>> less practical, even unusable, for real-world deployments.
>>>>>>>>>>>> I’m sharing this from the perspective of a large data lake
>>>>>>>>>>>> environment where views are heavily deployed and operationalized.
>>>>>>>>>>>>
>>>>>>>>>>>> Further, it's worth noting that the table spec is resilient to
>>>>>>>>>>>> catalog renaming, but the view spec is not. If we have an 
>>>>>>>>>>>> opportunity to
>>>>>>>>>>>> make the view spec similarly resilient, I wonder why not?
>>>>>>>>>>>> Both specifications are deterministic in their definition, but
>>>>>>>>>>>> one is more fragile to environmental changes than the other. 
>>>>>>>>>>>> Improving
>>>>>>>>>>>> resilience does not sacrifice determinism. It simply makes views 
>>>>>>>>>>>> safer and
>>>>>>>>>>>> more portable over time.
>>>>>>>>>>>>
>>>>>>>>>>>> Separately, given that there is no SQL construct today to
>>>>>>>>>>>> explicitly set default-catalog at creation time, what is the 
>>>>>>>>>>>> intuition
>>>>>>>>>>>> behind how engines like Spark and Trino currently assign 
>>>>>>>>>>>> default-catalog?
>>>>>>>>>>>> Today, both engines seem to set default-catalog to null if the
>>>>>>>>>>>> view catalog is defined, or to the view catalog if not.
>>>>>>>>>>>> What was the intended thought process behind this behavior?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Walaa
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Apr 28, 2025 at 1:33 PM Daniel Weeks <dwe...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Walaa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > tables inside views remain reachable after a catalog rename
>>>>>>>>>>>>>
>>>>>>>>>>>>> This problem stems from the exact environmental/configuration
>>>>>>>>>>>>> issue that we should not be trying to address.  I don't think we 
>>>>>>>>>>>>> would
>>>>>>>>>>>>> expect references to survive a catalog rename.  That's not 
>>>>>>>>>>>>> something
>>>>>>>>>>>>> covered by the spec and needs to be handled separately as a 
>>>>>>>>>>>>> platform-level
>>>>>>>>>>>>> migration specific to the affected environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The identifier resolution logic is clear and deterministic.
>>>>>>>>>>>>> It should not matter whether an engine resolves and encodes the
>>>>>>>>>>>>> default-catalog or leaves it to the resolution rules.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue isn't with how the spec is defined, but rather view
>>>>>>>>>>>>> behavior when you start altering the environment around it, which 
>>>>>>>>>>>>> isn't
>>>>>>>>>>>>> something we should be trying to define here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Apr 28, 2025 at 12:17 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for chiming in.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I believe the issues we’re seeing now go beyond just catalog
>>>>>>>>>>>>>> naming consistency. The behavior around default-catalog itself 
>>>>>>>>>>>>>> introduces
>>>>>>>>>>>>>> resolution inconsistencies even when catalog names are 
>>>>>>>>>>>>>> consistent.
>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * When default-catalog is set to null, tables inside views
>>>>>>>>>>>>>> remain reachable after a catalog rename. But if it is set to a 
>>>>>>>>>>>>>> non-null
>>>>>>>>>>>>>> value, table references will break.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * default-catalog causes table references inside views to be
>>>>>>>>>>>>>> early bound (i.e., bound at view creation time, especially when 
>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>> non-null value), while table references inside standalone 
>>>>>>>>>>>>>> queries are late
>>>>>>>>>>>>>> bound (bound at query time). This creates inconsistencies when 
>>>>>>>>>>>>>> resolving
>>>>>>>>>>>>>> the same table name inside and outside views, even within the 
>>>>>>>>>>>>>> same job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * It causes Spark's and Trino behavior to drift from the
>>>>>>>>>>>>>> spec. There is no way to fully align Spark's behavior without 
>>>>>>>>>>>>>> making
>>>>>>>>>>>>>> invasive changes to the Spark SQL grammar and the View 
>>>>>>>>>>>>>> DataSource API
>>>>>>>>>>>>>> (specifically on the CREATE side). This challenge would extend 
>>>>>>>>>>>>>> to other
>>>>>>>>>>>>>> engines too. Both Spark and Trino set this field based on a 
>>>>>>>>>>>>>> heuristic in
>>>>>>>>>>>>>> today's implementation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * With view nesting (views depending on views), these
>>>>>>>>>>>>>> inconsistencies amplify further, forcing users and engines to 
>>>>>>>>>>>>>> reason about
>>>>>>>>>>>>>> catalog resolution at every level in the view tree.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * It will be difficult to migrate Hive views to Iceberg with
>>>>>>>>>>>>>> that model. Migrated Hive views will have to unfollow that spec.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How would you suggest approaching the engine-level changes
>>>>>>>>>>>>>> required to support the current default-catalog field?
>>>>>>>>>>>>>> Also, do you believe the Spark and Trino communities would
>>>>>>>>>>>>>> align around having table resolution behave inconsistently 
>>>>>>>>>>>>>> between queries
>>>>>>>>>>>>>> and views, or inconsistency between Iceberg and other types of 
>>>>>>>>>>>>>> views?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Walaa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Apr 28, 2025 at 11:34 AM Daniel Weeks <
>>>>>>>>>>>>>> dwe...@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would agree with Jan's summary of why 'default-catalog'
>>>>>>>>>>>>>>> was introduced, but I think we need to step back and align on 
>>>>>>>>>>>>>>> what we are
>>>>>>>>>>>>>>> really attempting to support in the spec.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issues we're discussing largely stem from using multiple
>>>>>>>>>>>>>>> engines with cross catalog references and configurations where 
>>>>>>>>>>>>>>> catalog
>>>>>>>>>>>>>>> names are not aligned.  If we have multiple engines that all 
>>>>>>>>>>>>>>> have the same
>>>>>>>>>>>>>>> catalog names/configurations, the current spec implementation 
>>>>>>>>>>>>>>> is well
>>>>>>>>>>>>>>> defined for table resolution even across catalogs.  The 
>>>>>>>>>>>>>>> 'default-catalog'
>>>>>>>>>>>>>>> (and namespace equivalent) was intended to address the 
>>>>>>>>>>>>>>> resolution within
>>>>>>>>>>>>>>> the context of the sql text, not to address catalog/naming 
>>>>>>>>>>>>>>> inconsistencies.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I feel like we're trying to adapt the original intent to
>>>>>>>>>>>>>>> address the catalog naming/configuration and would argue that 
>>>>>>>>>>>>>>> we shouldn't
>>>>>>>>>>>>>>> attempt to do that as part of the spec.  Inconsistently named 
>>>>>>>>>>>>>>> catalogs are
>>>>>>>>>>>>>>> a reality, but we should consider that a 
>>>>>>>>>>>>>>> configuration/environmental issue,
>>>>>>>>>>>>>>> not something to solve for in the spec.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should support and advocate for consistency in catalog
>>>>>>>>>>>>>>> naming and define the spec along those lines.  The fact is that 
>>>>>>>>>>>>>>> with all of
>>>>>>>>>>>>>>> the recent work that's gone into making catalogs pluggable, it 
>>>>>>>>>>>>>>> makes more
>>>>>>>>>>>>>>> sense to just register catalog configuration with consistent 
>>>>>>>>>>>>>>> names (even if
>>>>>>>>>>>>>>> you have to duplicate the configuration for supporting existing
>>>>>>>>>>>>>>> readers/writers).  I think it's better to provide a path toward 
>>>>>>>>>>>>>>> consistency
>>>>>>>>>>>>>>> than to normalize complicated schemes to workaround the issues 
>>>>>>>>>>>>>>> caused by
>>>>>>>>>>>>>>> environmental/configuration inconsistencies.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If the goal is to create clever ways to hack the late
>>>>>>>>>>>>>>> binding resolution to swap in different catalogs or make 
>>>>>>>>>>>>>>> references
>>>>>>>>>>>>>>> contextual, I feel like that is something we should strongly 
>>>>>>>>>>>>>>> discourage as
>>>>>>>>>>>>>>> it leads to confusion about what is resolved as part of the 
>>>>>>>>>>>>>>> query.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At this point, I don't see a good argument to add
>>>>>>>>>>>>>>> additional configuration or change the resolution behaviors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Apr 28, 2025 at 12:40 AM Jan Kaul
>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the intention with the "default-catalog" was that
>>>>>>>>>>>>>>>> every query engine uses it to store its session default 
>>>>>>>>>>>>>>>> catalog at the time
>>>>>>>>>>>>>>>> of creating the view. This way the view could be reused in 
>>>>>>>>>>>>>>>> another session.
>>>>>>>>>>>>>>>> The idea was not to introduce an additional SQL syntax to set 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> default-catalog.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Generally we have different environments we want to support
>>>>>>>>>>>>>>>> with the view spec:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Consistent catalog naming
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When the environment supports it, using consistent catalog
>>>>>>>>>>>>>>>> names can have a great benefit for multi-catalog, multi-engine 
>>>>>>>>>>>>>>>> setups. With
>>>>>>>>>>>>>>>> consistent catalog names, using the "default-catalog" field 
>>>>>>>>>>>>>>>> works without
>>>>>>>>>>>>>>>> any issues.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. Inconsistent catalog naming
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This can be the case when different query engines refer to
>>>>>>>>>>>>>>>> the same physical catalog by different names. This often 
>>>>>>>>>>>>>>>> happens because
>>>>>>>>>>>>>>>> different query engines use different strategies to setup the 
>>>>>>>>>>>>>>>> catalogs. If
>>>>>>>>>>>>>>>> catalogs have inconsistent naming, using the "default-catalog" 
>>>>>>>>>>>>>>>> field does
>>>>>>>>>>>>>>>> not work because it is not guaranteed that the catalog name 
>>>>>>>>>>>>>>>> can be resolved
>>>>>>>>>>>>>>>> with another engine. Using the "view catalog" as a fallback is 
>>>>>>>>>>>>>>>> a better
>>>>>>>>>>>>>>>> solution for this use case, as it avoids catalog names 
>>>>>>>>>>>>>>>> altogether. It is
>>>>>>>>>>>>>>>> however limited to table references in the same catalog.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think of introducing a view property that
>>>>>>>>>>>>>>>> specifies if the "default-catalog" or the "view catalog" 
>>>>>>>>>>>>>>>> should be used?
>>>>>>>>>>>>>>>> This way, you could use the "default-catalog" in environments 
>>>>>>>>>>>>>>>> where you can
>>>>>>>>>>>>>>>> guarantee consistent naming, but you would be able to directly 
>>>>>>>>>>>>>>>> fallback to
>>>>>>>>>>>>>>>> the "view-catalog" when you don't have consistent naming. The 
>>>>>>>>>>>>>>>> query engines
>>>>>>>>>>>>>>>> could set the default for this view property at creation time. 
>>>>>>>>>>>>>>>> Spark for
>>>>>>>>>>>>>>>> example could set it to automatically use the "view catalog".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 4/26/25 05:33, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To help folks catch up on the latest discussions and
>>>>>>>>>>>>>>>> interpretation of the spec, I have summarized everything we 
>>>>>>>>>>>>>>>> discussed so
>>>>>>>>>>>>>>>> far at the top of the proposal document (here
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0>).
>>>>>>>>>>>>>>>> I have slightly updated the proposal to be in sync with the new
>>>>>>>>>>>>>>>> interpretation to avoid confusion. In summary:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Remove default-catalog and default-namespace fields from
>>>>>>>>>>>>>>>> the view spec completely.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Hence, we do not attempt to define separate view-level
>>>>>>>>>>>>>>>> default catalogs or namespaces.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Instead:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * If a table identifier inside a view lacks a catalog
>>>>>>>>>>>>>>>> qualifier, engines should resolve it using the current engine 
>>>>>>>>>>>>>>>> catalog at
>>>>>>>>>>>>>>>> query time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Reference table identifiers in the metadata exactly as
>>>>>>>>>>>>>>>> they appear in the view SQL text.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * If an identifier lacks the catalog part at creation, it
>>>>>>>>>>>>>>>> should still lack a catalog in the stored metadata.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Store UUIDs alongside table identifiers whenever possible.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:18 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the contribution Benny! +1 to the confusion the
>>>>>>>>>>>>>>>>> fallback creates. Also just to be clear, at this point and 
>>>>>>>>>>>>>>>>> after clarifying
>>>>>>>>>>>>>>>>> the current spec intentions, I am convinced that we should 
>>>>>>>>>>>>>>>>> remove the
>>>>>>>>>>>>>>>>> default catalog and default namespace fields altogether.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:13 PM Benny Chow <
>>>>>>>>>>>>>>>>> btc...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like to contribute my opinions on this:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - I don't particularly like the current behavior of
>>>>>>>>>>>>>>>>>> "default to the view's catalog when default-catalog is not 
>>>>>>>>>>>>>>>>>> set".
>>>>>>>>>>>>>>>>>> Fundamentally, I believe the intent of default-catalog and
>>>>>>>>>>>>>>>>>> default-namespace is there to help users write more concise 
>>>>>>>>>>>>>>>>>> SQL.
>>>>>>>>>>>>>>>>>> - spark session catalog is engine specific and I don't
>>>>>>>>>>>>>>>>>> think we should design something that says first use this 
>>>>>>>>>>>>>>>>>> catalog, then
>>>>>>>>>>>>>>>>>> that catalog.. or that catalog.  For example, resolving 
>>>>>>>>>>>>>>>>>> identifiers using
>>>>>>>>>>>>>>>>>> default-catalog -> view's catalog -> session catalog is not 
>>>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>>>> - We gotta support non-Iceberg tables otherwise I see no
>>>>>>>>>>>>>>>>>> value in putting views in the catalog to share with other 
>>>>>>>>>>>>>>>>>> engines
>>>>>>>>>>>>>>>>>> - Interoperability between different engine types is very
>>>>>>>>>>>>>>>>>> hard due to dialect issues... so I think we should focus on 
>>>>>>>>>>>>>>>>>> supporting
>>>>>>>>>>>>>>>>>> different clusters of the same engine type on a shared 
>>>>>>>>>>>>>>>>>> catalog.  For
>>>>>>>>>>>>>>>>>> example, AI and BI clusters on Spark sharing the same views 
>>>>>>>>>>>>>>>>>> in a REST
>>>>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Coincidentally, I think the ultimate solution is along
>>>>>>>>>>>>>>>>>> the lines of something Russell proposed last year:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/hoskfx8y3kvrcww52l4w9dxghp3pnlm7
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We've been looking at this interoperable identifier
>>>>>>>>>>>>>>>>>> problem through the lens of catalog resolution but maybe the 
>>>>>>>>>>>>>>>>>> right approach
>>>>>>>>>>>>>>>>>> is really about templating.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I would extend Russell's idea to allow identifiers in a
>>>>>>>>>>>>>>>>>> view to span catalogs to support non-Iceberg tables.   Also, 
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> default-catalog property could be templated as well.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>>>> Benny
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 4:02 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks Steven! How do you recommend making Spark
>>>>>>>>>>>>>>>>>>> implementation conform to the spec? Do we need Spark SQL 
>>>>>>>>>>>>>>>>>>> extensions and/or
>>>>>>>>>>>>>>>>>>> Spark catalog APIs for that?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How do you recommend reconciling the inconsistencies I
>>>>>>>>>>>>>>>>>>> shared regarding many resolution methods not consistently 
>>>>>>>>>>>>>>>>>>> being followed in
>>>>>>>>>>>>>>>>>>> different scenarios (view vs child table resolution, query 
>>>>>>>>>>>>>>>>>>> vs view
>>>>>>>>>>>>>>>>>>> resolution)? Note these occur when the default catalog is 
>>>>>>>>>>>>>>>>>>> set to a non-null
>>>>>>>>>>>>>>>>>>> value. If it helps, I can share concrete examples.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 3:52 PM Steven Wu <
>>>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The core issue is on the fall back behavior when
>>>>>>>>>>>>>>>>>>>> `default-catalog` is
>>>>>>>>>>>>>>>>>>>> not defined. Current view spec says the fallback should
>>>>>>>>>>>>>>>>>>>> be the catalog
>>>>>>>>>>>>>>>>>>>> where the view is defined. It doesn't really matter
>>>>>>>>>>>>>>>>>>>> what the catalog
>>>>>>>>>>>>>>>>>>>> is named (catalogX) by the read engine.
>>>>>>>>>>>>>>>>>>>> - If a view refers to the tables in the same catalog,
>>>>>>>>>>>>>>>>>>>> this is a
>>>>>>>>>>>>>>>>>>>> non-ambiguous and reasonable fallback behavior.
>>>>>>>>>>>>>>>>>>>> - If a view refers to tables from another catalog,
>>>>>>>>>>>>>>>>>>>> catalog names
>>>>>>>>>>>>>>>>>>>> should be included in the reference name already. So no
>>>>>>>>>>>>>>>>>>>> ambiguity
>>>>>>>>>>>>>>>>>>>> there either.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Potential inconsistent naming of catalog is a separate
>>>>>>>>>>>>>>>>>>>> problem, which
>>>>>>>>>>>>>>>>>>>> Iceberg view spec probably cannot solve. We can only
>>>>>>>>>>>>>>>>>>>> recommend that
>>>>>>>>>>>>>>>>>>>> catalog should be named consistently across usage for
>>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>> interoperability on name references.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This proposal is to change the fallback behavior to
>>>>>>>>>>>>>>>>>>>> engine's session
>>>>>>>>>>>>>>>>>>>> default catalog. I am not sure it is better than the
>>>>>>>>>>>>>>>>>>>> current fallback
>>>>>>>>>>>>>>>>>>>> behavior.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> > Today’s Spark behavior explicitly differs from this
>>>>>>>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view 
>>>>>>>>>>>>>>>>>>>> creation using the
>>>>>>>>>>>>>>>>>>>> session’s default catalog, not a supplied 
>>>>>>>>>>>>>>>>>>>> `default-catalog`.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I would argue that is a Spark implementation issue for
>>>>>>>>>>>>>>>>>>>> not conforming
>>>>>>>>>>>>>>>>>>>> to the spec.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 1:17 PM Walaa Eldin Moustafa
>>>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Hi Jan,
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Thanks again for continuing the discussion. I want to
>>>>>>>>>>>>>>>>>>>> highlight a few fundamental issues around the 
>>>>>>>>>>>>>>>>>>>> interpretation of
>>>>>>>>>>>>>>>>>>>> default-catalog:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Here is the real catch:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > * default-catalog cannot logically be defined at view
>>>>>>>>>>>>>>>>>>>> creation time. It would be circular: the view needs to 
>>>>>>>>>>>>>>>>>>>> exist before its
>>>>>>>>>>>>>>>>>>>> metadata (and hence default-catalog) can exist. This is 
>>>>>>>>>>>>>>>>>>>> visible in Spark’s
>>>>>>>>>>>>>>>>>>>> implementation, where `default-catalog` is not used.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > * Introducing a creation-time default-catalog setting
>>>>>>>>>>>>>>>>>>>> would require extending SQL syntax and engine APIs to 
>>>>>>>>>>>>>>>>>>>> promote it to a
>>>>>>>>>>>>>>>>>>>> first-class view concept. This would be intrusive, 
>>>>>>>>>>>>>>>>>>>> non-intuitive, and
>>>>>>>>>>>>>>>>>>>> realistically very difficult to standardize across engines.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > * Today’s Spark behavior explicitly differs from this
>>>>>>>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view 
>>>>>>>>>>>>>>>>>>>> creation using the
>>>>>>>>>>>>>>>>>>>> session’s default catalog, not a supplied 
>>>>>>>>>>>>>>>>>>>> `default-catalog`.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > * Hypothetically even if we patched in a
>>>>>>>>>>>>>>>>>>>> creation-time default-catalog, it would create an 
>>>>>>>>>>>>>>>>>>>> inconsistent binding
>>>>>>>>>>>>>>>>>>>> model between tables vs views (early vs late), and between 
>>>>>>>>>>>>>>>>>>>> tables in views
>>>>>>>>>>>>>>>>>>>> and in queries (again early vs late). For example, views 
>>>>>>>>>>>>>>>>>>>> and tables in
>>>>>>>>>>>>>>>>>>>> queries can withstand default catalog renames, but tables 
>>>>>>>>>>>>>>>>>>>> cannot when they
>>>>>>>>>>>>>>>>>>>> are used inside views -- it even applies to views inside 
>>>>>>>>>>>>>>>>>>>> views, which makes
>>>>>>>>>>>>>>>>>>>> this very hard to reason about considering nesting.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>>>>>>> > Walaa
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > On Fri, Apr 25, 2025 at 7:00 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> @Walaa:
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> I would argue that when you run a CREATE VIEW
>>>>>>>>>>>>>>>>>>>> statement the query engine knowns which catalog the view 
>>>>>>>>>>>>>>>>>>>> is being created
>>>>>>>>>>>>>>>>>>>> in. So even though we typically use late binding to 
>>>>>>>>>>>>>>>>>>>> resolve the view
>>>>>>>>>>>>>>>>>>>> catalog at query time, it can also be used at creation 
>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> The query engine would need to keep track of the
>>>>>>>>>>>>>>>>>>>> "view catalog" where the view is going to be created in. 
>>>>>>>>>>>>>>>>>>>> It can use that
>>>>>>>>>>>>>>>>>>>> catalog to resolve partial table identifiers if 
>>>>>>>>>>>>>>>>>>>> "default-catalog" is not
>>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> It can lead to some unintuitive behavior, where
>>>>>>>>>>>>>>>>>>>> partial identifiers in the view query resolve to a 
>>>>>>>>>>>>>>>>>>>> different catalog
>>>>>>>>>>>>>>>>>>>> compared to using them outside of a view.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> CREATE VIEW catalogA.sales.monthly_orders AS SELECT
>>>>>>>>>>>>>>>>>>>> * from sales.orders;
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> If the session default catalog is not "catalogA",
>>>>>>>>>>>>>>>>>>>> the "sales.orders" in the view query would not be the same 
>>>>>>>>>>>>>>>>>>>> as just
>>>>>>>>>>>>>>>>>>>> referencing "sales.orders" in a normal SQL statement. This 
>>>>>>>>>>>>>>>>>>>> is because
>>>>>>>>>>>>>>>>>>>> without a "default-catalog", the catalog name of 
>>>>>>>>>>>>>>>>>>>> "sales.orders" would
>>>>>>>>>>>>>>>>>>>> default to "catalogA", which is the view's catalog.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Jan
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> On 4/25/25 04:05, Manu Zhang wrote:
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> For example, if we want to validate that the tables
>>>>>>>>>>>>>>>>>>>> referenced in the view exist, how can we do that when 
>>>>>>>>>>>>>>>>>>>> default-catalog isn't
>>>>>>>>>>>>>>>>>>>> defined, since the view hasn't been created or loaded yet?
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> I don't think this is related to view spec. How do
>>>>>>>>>>>>>>>>>>>> we validate that a table exists without a default catalog, 
>>>>>>>>>>>>>>>>>>>> or do we always
>>>>>>>>>>>>>>>>>>>> use the current session catalog?
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>>>>>>> >> Manu
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa
>>>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> Hi Jan,
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> I think we still share the same understanding. Just
>>>>>>>>>>>>>>>>>>>> to clarify: when I referred to late binding as “similar” 
>>>>>>>>>>>>>>>>>>>> to the proposal, I
>>>>>>>>>>>>>>>>>>>> was acknowledging the distinction between view-level and 
>>>>>>>>>>>>>>>>>>>> table-level
>>>>>>>>>>>>>>>>>>>> resolution. But as you noted, both follow a late binding 
>>>>>>>>>>>>>>>>>>>> model.
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> That said, this still raises an interesting
>>>>>>>>>>>>>>>>>>>> question and a potential gap: if default-catalog is only 
>>>>>>>>>>>>>>>>>>>> defined at query
>>>>>>>>>>>>>>>>>>>> time, how should resolution work during view creation? For 
>>>>>>>>>>>>>>>>>>>> example, if we
>>>>>>>>>>>>>>>>>>>> want to validate that the tables referenced in the view 
>>>>>>>>>>>>>>>>>>>> exist, how can we
>>>>>>>>>>>>>>>>>>>> do that when default-catalog isn't defined, since the view 
>>>>>>>>>>>>>>>>>>>> hasn't been
>>>>>>>>>>>>>>>>>>>> created or loaded yet?
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>> >>> On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Yes, I have the same understanding. The view
>>>>>>>>>>>>>>>>>>>> catalog is resolved at query time.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> As you mentioned before, it's good to distinguish
>>>>>>>>>>>>>>>>>>>> between the physical catalog and it's reference used in 
>>>>>>>>>>>>>>>>>>>> SQL statements. The
>>>>>>>>>>>>>>>>>>>> important part is that the physical catalog of the view 
>>>>>>>>>>>>>>>>>>>> and the tables
>>>>>>>>>>>>>>>>>>>> referenced in it's definition stay consistent. You could 
>>>>>>>>>>>>>>>>>>>> create a view in a
>>>>>>>>>>>>>>>>>>>> given physical catalog by referring to it as "catalogA", 
>>>>>>>>>>>>>>>>>>>> as in your first
>>>>>>>>>>>>>>>>>>>> point. If you then, given a different setup, refer to the 
>>>>>>>>>>>>>>>>>>>> same physical
>>>>>>>>>>>>>>>>>>>> catalog as "catalogB" in another session/environment, the 
>>>>>>>>>>>>>>>>>>>> behavior should
>>>>>>>>>>>>>>>>>>>> still work.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> I would however rephrase your last point. Late
>>>>>>>>>>>>>>>>>>>> binding applies to the view catalog name and by extension 
>>>>>>>>>>>>>>>>>>>> to all partial
>>>>>>>>>>>>>>>>>>>> table references when no "default-catalog" is present. 
>>>>>>>>>>>>>>>>>>>> Resolving the view
>>>>>>>>>>>>>>>>>>>> catalog name at query time is not opposed to storing the 
>>>>>>>>>>>>>>>>>>>> view metadata in a
>>>>>>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Or maybe I don't entirely understand what you mean.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Jan
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> On 4/24/25 00:32, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Hi Jan,
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> > The view is executed when it's being referenced
>>>>>>>>>>>>>>>>>>>> in a SQL statement. That statement contains the 
>>>>>>>>>>>>>>>>>>>> information for the query
>>>>>>>>>>>>>>>>>>>> engine to resolve the catalog of the view.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> If I’m understanding correctly, that means:
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> * If the view is queried as SELECT * FROM
>>>>>>>>>>>>>>>>>>>> catalogA.namespace.view, then catalogA is considered the 
>>>>>>>>>>>>>>>>>>>> view’s catalog.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> * If the same view is later queried as SELECT *
>>>>>>>>>>>>>>>>>>>> FROM catalogB.namespace.view (after renaming catalogA to 
>>>>>>>>>>>>>>>>>>>> catalogB, and
>>>>>>>>>>>>>>>>>>>> keeping everything else the same), then catalogB becomes 
>>>>>>>>>>>>>>>>>>>> the view’s catalog.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Is that interpretation correct? If so, it sounds
>>>>>>>>>>>>>>>>>>>> to me like the catalog is resolved at query time, based on 
>>>>>>>>>>>>>>>>>>>> how the view is
>>>>>>>>>>>>>>>>>>>> referenced, not from any stored metadata. That would imply 
>>>>>>>>>>>>>>>>>>>> some sort of a
>>>>>>>>>>>>>>>>>>>> late binding behavior (similar to the proposal), as 
>>>>>>>>>>>>>>>>>>>> opposed to using some
>>>>>>>>>>>>>>>>>>>> catalog that "stores" the view definition.
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>> Walaa
>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>> >>>> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Hi Walaa,
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Thanks for clarifying the aspects of
>>>>>>>>>>>>>>>>>>>> non-determinism. Let me try to address your questions.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> 1. This is my interpretation of the current spec:
>>>>>>>>>>>>>>>>>>>> The view is executed when it's being referenced in a SQL 
>>>>>>>>>>>>>>>>>>>> statement. That
>>>>>>>>>>>>>>>>>>>> statement contains the information for the query engine to 
>>>>>>>>>>>>>>>>>>>> resolve the
>>>>>>>>>>>>>>>>>>>> catalog of the view. The query engine then uses that 
>>>>>>>>>>>>>>>>>>>> information to fetch
>>>>>>>>>>>>>>>>>>>> the view metadata from the catalog. It also needs to 
>>>>>>>>>>>>>>>>>>>> temporarily keep track
>>>>>>>>>>>>>>>>>>>> of which catalog it used to fetch the view metadata. It 
>>>>>>>>>>>>>>>>>>>> can then use that
>>>>>>>>>>>>>>>>>>>> information to resolve the table references in the views 
>>>>>>>>>>>>>>>>>>>> SQL definition in
>>>>>>>>>>>>>>>>>>>> case no default catalog is specified.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> 2. The important part is that the catalog can be
>>>>>>>>>>>>>>>>>>>> referenced at execution time. As long as that's the case I 
>>>>>>>>>>>>>>>>>>>> would assume the
>>>>>>>>>>>>>>>>>>>> view can be created in any catalog.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> I think your point is really valuable because the
>>>>>>>>>>>>>>>>>>>> current specification can lead to some unintuitive 
>>>>>>>>>>>>>>>>>>>> behavior. For example
>>>>>>>>>>>>>>>>>>>> for the following statement:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> CREATE VIEW catalogA.sales.monthly_orders AS
>>>>>>>>>>>>>>>>>>>> SELECT * from sales.orders;
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> If the session default catalog is not "catalogA",
>>>>>>>>>>>>>>>>>>>> the "sales.orders" in the view query would not be the same 
>>>>>>>>>>>>>>>>>>>> as just
>>>>>>>>>>>>>>>>>>>> referencing "sales.orders" in a normal SQL statement. This 
>>>>>>>>>>>>>>>>>>>> is because
>>>>>>>>>>>>>>>>>>>> without a "default-catalog", the catalog name of 
>>>>>>>>>>>>>>>>>>>> "sales.orders" would
>>>>>>>>>>>>>>>>>>>> default to "catalogA".
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> However, I like the current design of the view
>>>>>>>>>>>>>>>>>>>> spec, because it has the "closure" property. Because of 
>>>>>>>>>>>>>>>>>>>> the fact that the
>>>>>>>>>>>>>>>>>>>> "view catalog" has to be known when executing a view, all 
>>>>>>>>>>>>>>>>>>>> the information
>>>>>>>>>>>>>>>>>>>> required to resolve the table identifiers is contained in 
>>>>>>>>>>>>>>>>>>>> the view metadata
>>>>>>>>>>>>>>>>>>>> (and the "view catalog"). I think that if you make the 
>>>>>>>>>>>>>>>>>>>> identifier
>>>>>>>>>>>>>>>>>>>> resolution dependent on external parameters, it hinders 
>>>>>>>>>>>>>>>>>>>> portability.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Jan
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Hi Jan,
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Thanks for the thoughtful feedback.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> I think it’s important we clarify a key point
>>>>>>>>>>>>>>>>>>>> before going deeper:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Non-determinism is not caused by session fallback
>>>>>>>>>>>>>>>>>>>> behavior—it’s a fundamental limitation of using table 
>>>>>>>>>>>>>>>>>>>> identifiers alone,
>>>>>>>>>>>>>>>>>>>> regardless of whether we use the current rule, the 
>>>>>>>>>>>>>>>>>>>> proposed fallback to the
>>>>>>>>>>>>>>>>>>>> session’s default catalog, or even early vs. late binding.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> The same fully qualified identifier (e.g.,
>>>>>>>>>>>>>>>>>>>> catalogA.namespace.table) can resolve to different objects 
>>>>>>>>>>>>>>>>>>>> depending solely
>>>>>>>>>>>>>>>>>>>> on engine-specific routing logic or catalog aliases. So 
>>>>>>>>>>>>>>>>>>>> determinism isn’t
>>>>>>>>>>>>>>>>>>>> guaranteed just because an identifier is "fully 
>>>>>>>>>>>>>>>>>>>> qualified." The only
>>>>>>>>>>>>>>>>>>>> reliable anchor for identity is the UUID. That’s why the 
>>>>>>>>>>>>>>>>>>>> proposed use of
>>>>>>>>>>>>>>>>>>>> UUIDs is not just a hardening strategy. It’s the actual 
>>>>>>>>>>>>>>>>>>>> fix for correctness.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> To move the conversation forward, could you help
>>>>>>>>>>>>>>>>>>>> clarify two things in the context of the current spec:
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> * Where in the metadata is the “view catalog”
>>>>>>>>>>>>>>>>>>>> stored, so that an engine knows to fall back to it if 
>>>>>>>>>>>>>>>>>>>> default-catalog is
>>>>>>>>>>>>>>>>>>>> null?
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> * Are we even allowed to create views in the
>>>>>>>>>>>>>>>>>>>> session's default catalog (i.e., without specifying a 
>>>>>>>>>>>>>>>>>>>> catalog) in the
>>>>>>>>>>>>>>>>>>>> current Iceberg spec?
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> These questions are important because if we can’t
>>>>>>>>>>>>>>>>>>>> unambiguously recover the "view catalog" from metadata, 
>>>>>>>>>>>>>>>>>>>> then defaulting to
>>>>>>>>>>>>>>>>>>>> it is problematic. And if views can't be created in the 
>>>>>>>>>>>>>>>>>>>> default catalog,
>>>>>>>>>>>>>>>>>>>> then the fallback rule doesn’t generalize.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>> >>>>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Walaa,
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> thank you for your proposal. If I understood
>>>>>>>>>>>>>>>>>>>> correctly, you proposal is composed of three parts:
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> - session default catalog as fallback for
>>>>>>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> - session default namespace as fallback for
>>>>>>>>>>>>>>>>>>>> "default-namepace"
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> - Late binding + UUID validation
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> I have some comments regarding these points.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> 1. Session default catalog as fallback for
>>>>>>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Introducing a behavior that depends on the
>>>>>>>>>>>>>>>>>>>> current session setup is in my opinion the definition of 
>>>>>>>>>>>>>>>>>>>> "non-determinism".
>>>>>>>>>>>>>>>>>>>> You could be running the same query-engine and 
>>>>>>>>>>>>>>>>>>>> catalog-setup on different
>>>>>>>>>>>>>>>>>>>> days, with different default session catalogs (which is 
>>>>>>>>>>>>>>>>>>>> rather common), and
>>>>>>>>>>>>>>>>>>>> would be getting different results.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Whereas with the current behavior, the view
>>>>>>>>>>>>>>>>>>>> always produces the same results. The current behavior has 
>>>>>>>>>>>>>>>>>>>> some rough edges
>>>>>>>>>>>>>>>>>>>> in very niche use cases but I think is solid for most uses 
>>>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> 2. Session default namespace as fallback for
>>>>>>>>>>>>>>>>>>>> "default-namespace"
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Similar to the above.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> 3. Late binding + UUID validation
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> If I understand it correctly, the current
>>>>>>>>>>>>>>>>>>>> implementation already uses late binding.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Generally, having UUID validation makes the
>>>>>>>>>>>>>>>>>>>> setup more robust. Which is great. However, having UUID 
>>>>>>>>>>>>>>>>>>>> validation still
>>>>>>>>>>>>>>>>>>>> requires us to have a portable table identifier 
>>>>>>>>>>>>>>>>>>>> specification. Even if we
>>>>>>>>>>>>>>>>>>>> have the UUIDs of the referenced tables from the view, 
>>>>>>>>>>>>>>>>>>>> there simply isn't
>>>>>>>>>>>>>>>>>>>> an interface that let's us use those UUIDs. The catalog 
>>>>>>>>>>>>>>>>>>>> interface is
>>>>>>>>>>>>>>>>>>>> defined in terms of table identifiers.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> So we always require a working catalog setup and
>>>>>>>>>>>>>>>>>>>> suiting table identifiers to obtain the table metadata. We 
>>>>>>>>>>>>>>>>>>>> can use the
>>>>>>>>>>>>>>>>>>>> UUIDs to verify if we loaded the correct table. But this 
>>>>>>>>>>>>>>>>>>>> can only be done
>>>>>>>>>>>>>>>>>>>> after we used some identifier. Which means there is no way 
>>>>>>>>>>>>>>>>>>>> of using UUIDs
>>>>>>>>>>>>>>>>>>>> without a functioning catalog/identifier setup.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> In conclusion, I prefer the current behavior for
>>>>>>>>>>>>>>>>>>>> "default-catalog" because it is more deterministic in my 
>>>>>>>>>>>>>>>>>>>> opinion. And I
>>>>>>>>>>>>>>>>>>>> think the current spec does a good job for multi-engine 
>>>>>>>>>>>>>>>>>>>> table identifier
>>>>>>>>>>>>>>>>>>>> resolution. I see the UUID validation more of an 
>>>>>>>>>>>>>>>>>>>> additional hardening
>>>>>>>>>>>>>>>>>>>> strategy.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Jan
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks Renjie!
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> The existing spec has some guidance on resolving
>>>>>>>>>>>>>>>>>>>> catalogs on the fly already (to address the case of view 
>>>>>>>>>>>>>>>>>>>> text with table
>>>>>>>>>>>>>>>>>>>> identifiers missing the catalog part). The guidance is to 
>>>>>>>>>>>>>>>>>>>> use the catalog
>>>>>>>>>>>>>>>>>>>> where the view is stored. But I find this rule hard to 
>>>>>>>>>>>>>>>>>>>> interpret or use.
>>>>>>>>>>>>>>>>>>>> The catalog itself is a logical construct—such as a 
>>>>>>>>>>>>>>>>>>>> federated catalog that
>>>>>>>>>>>>>>>>>>>> delegates to multiple physical backends (e.g., HMS and 
>>>>>>>>>>>>>>>>>>>> REST). In such
>>>>>>>>>>>>>>>>>>>> cases, the catalog (e.g., `my_catalog` in 
>>>>>>>>>>>>>>>>>>>> `my_catalog.namespace1.table1`)
>>>>>>>>>>>>>>>>>>>> doesn’t physically store the tables; it only routes 
>>>>>>>>>>>>>>>>>>>> requests to underlying
>>>>>>>>>>>>>>>>>>>> stores. Therefore, defaulting identifier resolution based 
>>>>>>>>>>>>>>>>>>>> on the catalog
>>>>>>>>>>>>>>>>>>>> where the view is "stored" doesn’t align with how catalogs 
>>>>>>>>>>>>>>>>>>>> actually behave
>>>>>>>>>>>>>>>>>>>> in practice.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu <
>>>>>>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi, Walaa:
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks for the proposal.
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> I've reviewed the doc, but in general I have
>>>>>>>>>>>>>>>>>>>> some concerns with resolving catalog names on the fly with 
>>>>>>>>>>>>>>>>>>>> query engine
>>>>>>>>>>>>>>>>>>>> defined catalog names. This introduces some flexibility at 
>>>>>>>>>>>>>>>>>>>> first glance,
>>>>>>>>>>>>>>>>>>>> but also makes misconfiguration difficult to explain.
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> But I agree with one part that we should store
>>>>>>>>>>>>>>>>>>>> resolved table uuid in view metadata, as table/view 
>>>>>>>>>>>>>>>>>>>> renaming may introduce
>>>>>>>>>>>>>>>>>>>> errors that's difficult to understand for user.
>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Looking forward to keeping up the momentum and
>>>>>>>>>>>>>>>>>>>> closing out the MV spec as well. I’m hoping we can proceed 
>>>>>>>>>>>>>>>>>>>> to a vote next
>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Here is a summary in case that helps. The
>>>>>>>>>>>>>>>>>>>> proposal outlines a strategy for handling table 
>>>>>>>>>>>>>>>>>>>> identifiers in Iceberg view
>>>>>>>>>>>>>>>>>>>> metadata, with the goal of ensuring correctness, 
>>>>>>>>>>>>>>>>>>>> portability, and engine
>>>>>>>>>>>>>>>>>>>> compatibility. It recommends resolving table identifiers 
>>>>>>>>>>>>>>>>>>>> at read time (late
>>>>>>>>>>>>>>>>>>>> binding) rather than creation time, and introduces 
>>>>>>>>>>>>>>>>>>>> UUID-based validation to
>>>>>>>>>>>>>>>>>>>> maintain identity guarantees across engines, or sessions. 
>>>>>>>>>>>>>>>>>>>> It also revises
>>>>>>>>>>>>>>>>>>>> how default-catalog and default-namespace are handled 
>>>>>>>>>>>>>>>>>>>> (defaulting both to
>>>>>>>>>>>>>>>>>>>> the session context if not explicitly set) to better align 
>>>>>>>>>>>>>>>>>>>> with engine
>>>>>>>>>>>>>>>>>>>> behavior and improve cross-engine interoperability.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Please let me know your thoughts.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks Eduard and Sung! I have addressed the
>>>>>>>>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> One key point to keep in mind is that catalog
>>>>>>>>>>>>>>>>>>>> names in the spec refer to logical catalogs—i.e., the 
>>>>>>>>>>>>>>>>>>>> first part of a
>>>>>>>>>>>>>>>>>>>> three-part identifier. These correspond to Spark's 
>>>>>>>>>>>>>>>>>>>> DataSourceV2 catalogs,
>>>>>>>>>>>>>>>>>>>> Trino connectors, and similar constructs. This is a level 
>>>>>>>>>>>>>>>>>>>> of abstraction
>>>>>>>>>>>>>>>>>>>> above physical catalogs, which are not referenced or used 
>>>>>>>>>>>>>>>>>>>> in the view spec.
>>>>>>>>>>>>>>>>>>>> The reason is that table identifiers in the view 
>>>>>>>>>>>>>>>>>>>> definition/text itself
>>>>>>>>>>>>>>>>>>>> refer to logical catalogs, not physical ones (since they 
>>>>>>>>>>>>>>>>>>>> interface directly
>>>>>>>>>>>>>>>>>>>> with the engine and not a specific metastore).
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun <
>>>>>>>>>>>>>>>>>>>> sungwy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thank you Walaa for the proposal. I think
>>>>>>>>>>>>>>>>>>>> view portability is a very important topic for us to 
>>>>>>>>>>>>>>>>>>>> continue discussing as
>>>>>>>>>>>>>>>>>>>> it relies on many assumptions within the data ecosystem 
>>>>>>>>>>>>>>>>>>>> for it to function
>>>>>>>>>>>>>>>>>>>> like you've highlighted well in the document.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> I've added a few comments around how this
>>>>>>>>>>>>>>>>>>>> may impact the permission questions the engines will be 
>>>>>>>>>>>>>>>>>>>> asking, and whether
>>>>>>>>>>>>>>>>>>>> that is the desired behavior.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Sung
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard
>>>>>>>>>>>>>>>>>>>> Tudenhöfner <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks Walaa for tackling this problem.
>>>>>>>>>>>>>>>>>>>> I've added a few comments to get a better understanding of 
>>>>>>>>>>>>>>>>>>>> how this will
>>>>>>>>>>>>>>>>>>>> look like in the actual implementation.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Eduard
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Starting this thread to resume our
>>>>>>>>>>>>>>>>>>>> discussion on how to reference table identifiers from 
>>>>>>>>>>>>>>>>>>>> Iceberg metadata, a
>>>>>>>>>>>>>>>>>>>> key aspect of the view specification, particularly in 
>>>>>>>>>>>>>>>>>>>> relation to the MV
>>>>>>>>>>>>>>>>>>>> (materialized view) extensions.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I had the chance to speak offline with a
>>>>>>>>>>>>>>>>>>>> few community members to better understand how the current 
>>>>>>>>>>>>>>>>>>>> spec is being
>>>>>>>>>>>>>>>>>>>> interpreted. Those conversations served as inputs to a new 
>>>>>>>>>>>>>>>>>>>> proposal on how
>>>>>>>>>>>>>>>>>>>> table identifier references could be represented in 
>>>>>>>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> You can find the proposal here [1]. I look
>>>>>>>>>>>>>>>>>>>> forward to your feedback and working together to move this 
>>>>>>>>>>>>>>>>>>>> forward so we
>>>>>>>>>>>>>>>>>>>> can finalize the MV spec as well.
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

Reply via email to