Hi Jan, I do not think this is feasible because it assumes the catalog
identifiers do not collide across catalogs. Anyways, let us not over
engineer this use case. As I mentioned, it was for illustration purposes.

Since the discussion moved from “UUIDs vs Sequence numbers” to “UUIDs vs
catalog table identifiers”, I can help summarize the discussion on the new
options in a doc and we can iterate on it so we are on the same page on the
pros and cons. It might help others catch up on this discussion too.

Thanks,
Walaa.

On Tue, Aug 20, 2024 at 11:31 PM Jan Kaul <jank...@mailbox.org.invalid>
wrote:

> Hi Walaa,
>
> aside from the fact that this is not a desired setup this is not an
> argument against using a refresh-state with identifiers as the key. In the
> case you're describing you would just have two entries with different
> identifiers in the refresh-state referencing the same snapshot-id. You
> would still be able to determine the freshness of the precomputed data.
>
> Thanks,
>
> Jan
> On 20.08.24 18:53, Walaa Eldin Moustafa wrote:
>
> Theoretically, we could have multiple catalogs each with different table
> name entries but referring to the same Iceberg table metadata, and hence
> same UUIDs (view metadata cannot be shared since they are strongly bound to
> the catalog identifiers). I understand this is not an everyday scenario but
> it illustrates the semantic gap.
>
> Thanks,
> Walaa.
>
>
> On Mon, Aug 19, 2024 at 10:54 PM Benny Chow <btc...@gmail.com> wrote:
>
>> Hi Walaa, I personally don't see a semantic issue with putting the table
>> identifiers in the refresh state.  The purpose of the refresh state is to
>> basically take a snapshot of the table and view versions at the time of
>> materialization.  Directly using table identifiers seems pretty natural to
>> me.   So, I'm +1 for:
>>
>> no lineage
>> + refresh-state key = identifier
>>
>> Thanks
>> Benny
>>
>>
>>
>> On Mon, Aug 19, 2024 at 9:19 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Hi Micah, it is mostly about the typical results of denormalization such
>>> as data consistency, management complexity, integrity, etc. However, as
>>> mentioned earlier, the main reason would be the semantic gap around using
>>> catalog table identifiers as a concept in the table (more specifically
>>> snapshot summary) spec. Denormalization in this case is a minor issue.
>>>
>>> Thanks,
>>> Walaa.
>>>
>>>
>>> On Mon, Aug 19, 2024 at 4:44 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Micah, for the latter, I meant the type of denormalization of
>>>>> repeating a 3-part name as opposed to using an ID.
>>>>
>>>>
>>>> Is the concern here just metadata size or something else?  For size I
>>>> think if this is really anticipated to be a problem that it is likely for
>>>> the state map in general, and we could investigate some more sophisticated
>>>> encodings for the State map even without the overlap. I think maybe this
>>>> could be handled if it proves to be a problem but hopefully engines are
>>>> placing a reasonable cap on view depth + number of tables per view which
>>>> puts an upper bound on overall size.
>>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>> On Fri, Aug 16, 2024 at 4:56 PM Walaa Eldin Moustafa <
>>>> wa.moust...@gmail.com> wrote:
>>>>
>>>>> Thanks Micah, for the latter, I meant the type of denormalization of
>>>>> repeating a 3-part name as opposed to using an ID.
>>>>>
>>>>> On Fri, Aug 16, 2024 at 4:52 PM Micah Kornfield <emkornfi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> However, this still does not address the semantic issue which is more
>>>>>>> fundamental in my opinion. The Iceberg table spec is not aware of 
>>>>>>> catalog
>>>>>>> table identifiers and this use will be the first break of this 
>>>>>>> abstraction.
>>>>>>
>>>>>>
>>>>>> IIUC, based on Jan's comments, we are not going to modify the table
>>>>>> specification.  I thought the state map was effectively opaque metadata
>>>>>> from the table specification perspective?  If this is the case I feel 
>>>>>> like
>>>>>> that is OK and not a blocker, I think by their nature as already 
>>>>>> discussed
>>>>>> MVs need catalog information to function properly and the choice to put
>>>>>> catalog information into the table metadata is pragmatic and preserves
>>>>>> other desirable properties.  It might be a more important point if we 
>>>>>> want
>>>>>> to update the table specification (IMO, I still think it would probably 
>>>>>> be
>>>>>> OK).
>>>>>>
>>>>>> On a side note, it does not address the denormalization issue either
>>>>>>> if we ever want to introduce the lineage in the view as a nice-to-have.
>>>>>>
>>>>>>
>>>>>> I think if lineage is introduced to the View metadata, it should only
>>>>>> hold direct dependencies for the reasons already discussed. IMO, I think
>>>>>> the potential overlap is OK as they serve two different purposes.
>>>>>>
>>>>>> Cheers,
>>>>>> Micah
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 16, 2024 at 4:27 PM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> That is right. I agree that in the case of using catalog identifiers
>>>>>>> in state information, using them in lineage information would be a
>>>>>>> nice-to-have and not a requirement.
>>>>>>>
>>>>>>> However, this still does not address the semantic issue which is
>>>>>>> more fundamental in my opinion. The Iceberg table spec is not aware of
>>>>>>> catalog table identifiers and this use will be the first break of this
>>>>>>> abstraction.
>>>>>>>
>>>>>>> On a side note, it does not address the denormalization issue either
>>>>>>> if we ever want to introduce the lineage in the view as a nice-to-have.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 16, 2024 at 10:09 AM Jan Kaul
>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Walaa,
>>>>>>>>
>>>>>>>> I would argue that for the refresh operation the query engine has
>>>>>>>> to parse the query and then somehow execute it. For a full refresh it 
>>>>>>>> will
>>>>>>>> directly execute the query and for a incremental refresh it will 
>>>>>>>> execute a
>>>>>>>> modified version. Therefore it has to fully expand the query tree.
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>> Am 16.08.2024 18:13 schrieb Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com>:
>>>>>>>>
>>>>>>>> Thanks Jan for the summary.
>>>>>>>>
>>>>>>>> For this point:
>>>>>>>>
>>>>>>>> > For a refresh operation the query engine has to parse the SQL and
>>>>>>>> fully expand the lineage with it's children anyway.  So the lineage is 
>>>>>>>> not
>>>>>>>> strictly required.
>>>>>>>>
>>>>>>>> If the lineage is provided at creation time by the respective
>>>>>>>> engine, the refresh operation does not need to parse the SQL, correct?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Walaa.
>>>>>>>>
>>>>>>>> On Fri, Aug 16, 2024 at 12:24 AM Jan Kaul
>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>
>>>>>>>> As the table I created is not properly shown in the mailing list
>>>>>>>> I'll reformat the summary of the different drawbacks again:
>>>>>>>>
>>>>>>>> Drawbacks of (no lineage, refresh-state key = identifier):
>>>>>>>>
>>>>>>>> - introduces catalog identifiers into table metadata (#4)
>>>>>>>> - query engine has to expand lineage at refresh time (happens
>>>>>>>> anyway)
>>>>>>>>
>>>>>>>> Drawbacks of (normalized lineage, refresh-state key = uuid):
>>>>>>>>
>>>>>>>> - recursive calls to catalog to expand lineage at read time (#2)
>>>>>>>> - fragile by requiring child views to have lineage field
>>>>>>>>
>>>>>>>> Drawbacks of (denormalized lineage, refresh-state key = uuid):
>>>>>>>>
>>>>>>>> - update of materialized view version required if child view is
>>>>>>>> updated (#5)
>>>>>>>> On 16.08.24 09:17, Jan Kaul wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Thanks Micah for clearly stating the requirements. I think this
>>>>>>>> gives better clarity for the discussion.
>>>>>>>>
>>>>>>>> It seems like we don't have a solution that satisfies all
>>>>>>>> requirements at once. So we would need to choose which has the fewest
>>>>>>>> drawbacks.
>>>>>>>>
>>>>>>>> I would like to summarize the different drawbacks that came up in
>>>>>>>> the discussion.
>>>>>>>> no lineage
>>>>>>>> + refresh-state key = identifier
>>>>>>>> normalized lineage
>>>>>>>> + refresh-state key = uuid
>>>>>>>> denormalized lineage
>>>>>>>> + refresh-state key = uuid
>>>>>>>> - introduces catalog identifiers into table metadata (#4)
>>>>>>>> - query engine has to expand lineage at refresh time (happens
>>>>>>>> anyway)
>>>>>>>> - recursive calls to catalog to expand lineage at read time (#2)
>>>>>>>> - fragile by requiring child views to have lineage field
>>>>>>>> - update of materialized view version required if child view is
>>>>>>>> updated (#5)
>>>>>>>>
>>>>>>>> With identifiers as the refresh-state keys, the lineage is not
>>>>>>>> strictly required and becomes an orthogonal proposal. That's why I 
>>>>>>>> left it
>>>>>>>> out if the comparison.
>>>>>>>>
>>>>>>>> In my opinion introducing catalog identifiers into the table
>>>>>>>> metadata (requirement #4) is the least significant drawback as it is 
>>>>>>>> not a
>>>>>>>> technical reason but more about semantics. Especially as the 
>>>>>>>> identifiers
>>>>>>>> are not introduced into the table spec but are rather stored in the
>>>>>>>> snapshot summary. That's why I'm in favor of using the catalog 
>>>>>>>> identifiers
>>>>>>>> as the refresh-state keys.
>>>>>>>>
>>>>>>>> Regarding your last point Walaa:
>>>>>>>>
>>>>>>>> The option of using catalog identifiers in the state map still
>>>>>>>> requires keeping lineage information in the view because REFRESH MV 
>>>>>>>> needs
>>>>>>>> the latest fully expanded children (which could have changed from the 
>>>>>>>> set
>>>>>>>> of children currently in the state map), without reparsing the view 
>>>>>>>> tree.
>>>>>>>>
>>>>>>>> For a refresh operation the query engine has to parse the SQL and
>>>>>>>> fully expand the lineage with it's children anyway.  So the lineage is 
>>>>>>>> not
>>>>>>>> strictly required.
>>>>>>>>
>>>>>>>> If I understand correctly, most of you are also in favor of using
>>>>>>>> catalog identifiers + ref as the refresh-state keys and postponing the
>>>>>>>> lineage proposal.
>>>>>>>>
>>>>>>>> I hope that we can move the discussion forward.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>> On 16.08.24 08:07, Walaa Eldin Moustafa wrote:
>>>>>>>>
>>>>>>>> The option of using catalog identifiers in the state map still
>>>>>>>> requires keeping lineage information in the view because REFRESH MV 
>>>>>>>> needs
>>>>>>>> the latest fully expanded children (which could have changed from the 
>>>>>>>> set
>>>>>>>> of children currently in the state map), without reparsing the view 
>>>>>>>> tree.
>>>>>>>> Therefore, catalog identifiers in the state map, does not eliminate the
>>>>>>>> need for tracking children in the form of catalog identifiers in the
>>>>>>>> lineage side (but in this case lineage will be a set instead of just a
>>>>>>>> map).
>>>>>>>>
>>>>>>>> Hence, my concerns with using catalog identifiers (as opposed to
>>>>>>>> UUIDs) are:
>>>>>>>> * The fundamental issue where the table spec depends on/refers to
>>>>>>>> the view spec (because such catalog identifiers are not defined in the
>>>>>>>> table spec and the only place they have a meaning is in the view spec
>>>>>>>> lineage information).
>>>>>>>> * (less fundamental) The denormalization introduced by this
>>>>>>>> arrangement, where each identifier is 3-parts and all of them repeat in
>>>>>>>> both lineage info and state map.
>>>>>>>>
>>>>>>>> I am not very concerned with recursive expansion (through multiple
>>>>>>>> calls), as it is always the case with views.
>>>>>>>>
>>>>>>>> On a positive note, looks like we agree to move past sequence
>>>>>>>> numbers :)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Walaa.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 4:07 PM Micah Kornfield <
>>>>>>>> emkornfi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> I think given the constraint that catalog lookup has to be by
>>>>>>>> identifier and not UUID, I'd prefer using identifier in the refresh 
>>>>>>>> state.
>>>>>>>> If we use identifiers, we can directly parallelize the catalog calls to
>>>>>>>> fetch the latest state.  If we use UUID, the engine has to go back to 
>>>>>>>> the
>>>>>>>> MV and possibly additional views to reconstruct the lineage map.  It's 
>>>>>>>> just
>>>>>>>> a lot slower and more work for the engine when there is a MV that
>>>>>>>> references a lot of views (and those views reference additional views).
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm +1 on using catalog identifiers as the key.  As you point out
>>>>>>>> this is inline with #2 (try to minimize serial catalog lookups) in 
>>>>>>>> addition
>>>>>>>> to supporting requirement #3.
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 3:27 PM Benny Chow <btc...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I think given the constraint that catalog lookup has to be by
>>>>>>>> identifier and not UUID, I'd prefer using identifier in the refresh 
>>>>>>>> state.
>>>>>>>> If we use identifiers, we can directly parallelize the catalog calls to
>>>>>>>> fetch the latest state.  If we use UUID, the engine has to go back to 
>>>>>>>> the
>>>>>>>> MV and possibly additional views to reconstruct the lineage map.  It's 
>>>>>>>> just
>>>>>>>> a lot slower and more work for the engine when there is a MV that
>>>>>>>> references a lot of views (and those views reference additional views).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Benny
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 2:14 PM Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Thanks Jan, Micah, and Karuppayya for chiming in.
>>>>>>>>
>>>>>>>> I do not think 3 and 4 are at odds with each other (for example
>>>>>>>> maintaining both lineage map and state map through UUID can achieve 
>>>>>>>> both).
>>>>>>>> Also, I do not think we can drop the lineage map since in many 
>>>>>>>> catalogs,
>>>>>>>> the only lookup method is by the catalog identifier, and not the UUID.
>>>>>>>>
>>>>>>>> I think if we go with UUIDs in the state, we should have a lineage
>>>>>>>> map (from identifiers to UUIDs) to go with it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Walaa.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 1:45 PM karuppayya <
>>>>>>>> karuppayya1...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> +1 to storing the refresh state as a map of UUIDs to snapshot IDs,
>>>>>>>> and deferring the inclusion of lineage to a future iteration.(like 
>>>>>>>> Micha
>>>>>>>> mentioned)
>>>>>>>> This would greatly simplify the current design.
>>>>>>>>
>>>>>>>> Also in terms of identifiers to use(UUID or catalog identifier) for
>>>>>>>> the refresh state
>>>>>>>> We will not be able to fetch the table/View using the UUID alone,
>>>>>>>> for example from Hive based catalog.
>>>>>>>> We do not have the direct mapping between UUID and table/view.
>>>>>>>> Which leaves us only with the catalog identifiers?
>>>>>>>>
>>>>>>>> Thanks & Regards
>>>>>>>> Karuppayya
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 9:16 AM Micah Kornfield <
>>>>>>>> emkornfi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> I think it might be worth restating perceived requirements and
>>>>>>>> making sure there is alignment on them.
>>>>>>>>
>>>>>>>> If I am reading correctly, I think the following are perceived
>>>>>>>> requirements:
>>>>>>>> 1. An engine must be able to unambiguously detect that an
>>>>>>>> underlying queried entity has changed or not via metadata to decide if
>>>>>>>> materialized table data can be used.
>>>>>>>> 2. The number of sequential catalog reads an engine needs to make
>>>>>>>> to make use of a materialized table state at read time is minimized.
>>>>>>>> 3. Engines that don't understand a SQL dialect can still use MV
>>>>>>>> information if it is not stale.
>>>>>>>> 4. Table refs (catalog identifiers) should not appear in the
>>>>>>>> materialized table metadata (i.e. state).
>>>>>>>> 5. The view part of the MV definition should not need a new
>>>>>>>> revision for any changes to objects it queries as long as their schemas
>>>>>>>> stay compatible (only state information on the materialized table need 
>>>>>>>> to
>>>>>>>> change).
>>>>>>>>
>>>>>>>> In my mind, requirement 1, is the only true requirement.  I think
>>>>>>>> this necessitates having UUID + snapshot ID as part of the state
>>>>>>>> information (not necessarily part of the Lineage).  I think it also
>>>>>>>> necessitates having a denormalized view of all entities that are inputs
>>>>>>>> into the MV in the state information (a view object might not change 
>>>>>>>> but
>>>>>>>> its underlying tables or views could change and that must be detected).
>>>>>>>>
>>>>>>>> Requirements 2 and 5 are somewhat at odds with each other.  If
>>>>>>>> information is denormalized (fully expanded) in Lineage, it means if 
>>>>>>>> table
>>>>>>>> information is somehow dropped from an intermediate view, one would 
>>>>>>>> need to
>>>>>>>> update the view (or make excess calls to the catalog). In my mind, this
>>>>>>>> argues for normalization of the lineage stored on the view (with the 
>>>>>>>> cost
>>>>>>>> of potentially 1 additional serial catalog lookup once the state
>>>>>>>> information is retrieved).
>>>>>>>>
>>>>>>>> I think #3 is at odds with #4.  I think #3 is more worthwhile, then
>>>>>>>> keeping #4 (and as Jan noted #4 adds complexity).
>>>>>>>>
>>>>>>>> I think the last remaining question is if lineage serves any
>>>>>>>> purpose.  I think it is useful for the following reasons:
>>>>>>>> a)  When there are no intermediate views queried, it allows for
>>>>>>>> fully parallelized lookup calls to the catalog without having to parse 
>>>>>>>> the
>>>>>>>> SQL statement first
>>>>>>>> b)  Allows tools that don't need to lookup state information  or
>>>>>>>> parse SQL but still navigate MV/view trees.
>>>>>>>>
>>>>>>>> Both of these seem relatively minor, so lineage could perhaps be
>>>>>>>> left out in the first iteration.
>>>>>>>>
>>>>>>>> As it applies to Jan's questions:
>>>>>>>>
>>>>>>>> 1. Should we move the identifiers out of the refresh-state into a
>>>>>>>> new lineage record that is stored as part of the view metadata?
>>>>>>>>
>>>>>>>> No, I don't think so, I think #5 is a reasonable requirement and I
>>>>>>>> think this violates it.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. If yes, should the lineage in the view be fully expanded?
>>>>>>>>
>>>>>>>> No, I think only the state should be fully expanded (for reasons
>>>>>>>> mentioned above, it potentially requires more updates to the view then
>>>>>>>> necessary).
>>>>>>>>
>>>>>>>>
>>>>>>>> 3. What should be used as an identifier in the lineage to reference
>>>>>>>> entries in the refresh-state?
>>>>>>>>
>>>>>>>>
>>>>>>>> Catalog identifiers make sense to me.  If we agree requirement #3
>>>>>>>> is not a requirement then it seems like this could also be UUIDs.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Micah
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 7:57 AM Benny Chow <btc...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> If we go with either UUID or Table Identifier +
>>>>>>>> VersionID/SnapshotId in the refresh state, then this list is fully 
>>>>>>>> expanded
>>>>>>>> already.  So, to validate the freshness of a materialization, the 
>>>>>>>> engine
>>>>>>>> doesn't even need to look at the view lineage.  IMO, the view lineage 
>>>>>>>> is
>>>>>>>> nice to have but not a necessary requirement for MVs.  The view lineage
>>>>>>>> makes sharing of views between engines without common SQL dialects
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> Benny
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2024 at 12:22 AM Jan Kaul
>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I would like to reemphasize the purpose of the refresh-state for
>>>>>>>> materialized views. The purpose is to determine if the precomputed 
>>>>>>>> data is
>>>>>>>> fresh, stale or invalid. For that the current snapshot-id of every 
>>>>>>>> table in
>>>>>>>> the query tree has to be fetched from the catalog by using its full
>>>>>>>> identifier and ref. Additionally the refresh state stores the 
>>>>>>>> snapshot-id
>>>>>>>> of the last refresh.
>>>>>>>>
>>>>>>>> To summarize: *To determine the freshness of the precomputed data
>>>>>>>> we require the full identifier + ref and snapshot-id of the last 
>>>>>>>> refresh
>>>>>>>> for every table in the fully expanded query tree*
>>>>>>>>
>>>>>>>> This is a requirement from how the catalog works and independent
>>>>>>>> from how we design the lineage/refresh state. Additionally we 
>>>>>>>> previously
>>>>>>>> agreed that we should be able to obtain the full list of identifiers
>>>>>>>> without needing to parse the SQL definition.
>>>>>>>>
>>>>>>>> Now we are having a discussion in how to store and obtain the fully
>>>>>>>> expanded list of table identifiers and snapshot-ids. To move the 
>>>>>>>> discussion
>>>>>>>> forward I think it would be valuable to answer the following 3 
>>>>>>>> questions:
>>>>>>>>
>>>>>>>> 1. Should we move the identifiers out of the refresh-state into a
>>>>>>>> new lineage record that is stored as part of the view metadata?
>>>>>>>>
>>>>>>>> 2. If yes, should the lineage in the view be fully expanded?
>>>>>>>>
>>>>>>>> 3. What should be used as an identifier in the lineage to reference
>>>>>>>> entries in the refresh-state?
>>>>>>>>
>>>>>>>> 1. Question:
>>>>>>>>
>>>>>>>> We already agreed that this would be a good idea because we
>>>>>>>> wouldn't introduce the identifier concept to the table metadata. 
>>>>>>>> However,
>>>>>>>> looking at the complexity that comes with the alternatives, I would 
>>>>>>>> like to
>>>>>>>> keep this question open.
>>>>>>>>
>>>>>>>> 2. Question:
>>>>>>>>
>>>>>>>> I'm against using a not fully expanded lineage in the view struct.
>>>>>>>> To recall we require every identifier in the fully expanded query tree 
>>>>>>>> to
>>>>>>>> determine the freshness. Not storing all identifiers in the lineage 
>>>>>>>> would
>>>>>>>> mean to recursively call the catalog and expand the query tree at read
>>>>>>>> time. This can lead to a large overhead for determining the refresh 
>>>>>>>> state
>>>>>>>> compared to expanding the query tree once at creation time and then 
>>>>>>>> storing
>>>>>>>> the fully expanded lineage.
>>>>>>>>
>>>>>>>> 3. Question:
>>>>>>>>
>>>>>>>> This depends on Question 2.
>>>>>>>>
>>>>>>>> For a not fully expanded lineage, the only options would be uuids
>>>>>>>> or catalog identifiers.
>>>>>>>>
>>>>>>>> For a fully expanded lineage the question isn't all that relevant.
>>>>>>>> The current design specifies that the lineage is
>>>>>>>>
>>>>>>>>

Reply via email to