Hi

Thanks Eric. We quickly discussed together about metadata json in London,
so it’s aligned.

If the caching makes sense, I wonder for the persistence layer. Maybe we
should clearly state who is responsible of what.

Overall it looks a great idea.

Regards
JB

Le ven. 23 mai 2025 à 23:52, Yufei Gu <flyrain...@gmail.com> a écrit :

> Thanks for doing this, Eric! It will boost performance a lot for tables
> with reasonable size metadata.json files. We also automatically get an
> in-memory cache since the Polaris entity is cached by default.
> Agreed to defer any separated caching mechanism so that we don't have to
> care about consistency issues.
>
> Yufei
>
>
> On Fri, May 23, 2025 at 7:57 AM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Some time ago I opened this PR <
> https://github.com/apache/polaris/pull/433
> > >
> > which proposes to store/cache TableMetadata in the Polaris metastore,
> > avoiding a trip to object storage in many cases. Based on this recent
> > comment <
> > https://github.com/apache/polaris/pull/433#issuecomment-2904298967> I
> > wanted to start up a mailing list thread for discussion about this
> feature
> > as it might be a little hard to follow comment threads on what is now a
> > very old PR.
> >
> > The proposal is, in a nutshell, to add a new internal property
> > metadata-cache-content to IcebergTableLikeEntity's internal properties
> and
> > to use that to store the exact contents of a table's metadata.json. The
> > content can be updated whenever the metadata.json is read and can be
> > configured to happen only for metadata.json files below some approximate
> > size.
> >
> > I recently used the benchmark suite proposed in this PR
> > <https://github.com/apache/polaris-tools/pull/21> to measure the impact
> of
> > the change and found it to dramatically improve loadTable performance.
> >
> > Some things that have been brought up which are *not* in scope for this
> PR:
> > 1. Directly loading the metadata.json content into a LoadTableResponse
> > without building an in-memory TableMetadata object was previously in the
> PR
> > but removed after this comment
> > <https://github.com/apache/polaris/pull/433#issuecomment-2885074219>
> from
> > Russell; it's planned as a followup.
> > 2. Storing individual parts of table metadata.json in persistence, i.e.
> > just the schema. We can do this if a use case arises, but being able to
> > store whole table metadata is beneficial immediately.
> > 3. A separate entity for table metadata. Because we add the table
> metadata
> > to IcebergTableLikeEntity we immediately benefit from the entity cache
> and
> > don't have to worry too much about consistency.
> > 4. A separate cache for table metadata. Similar to the above, this would
> > make handling consistency more complicated. Having a separate cache,
> maybe
> > with its own size or TTL configurations, just for table metadata could
> be a
> > good followup but it's not necessary to make things work.
> >
> > This is a feature that has the potential to deliver tremendous latency
> > benefits and one that opens up several interesting possibilities for
> > followup improvements.
> >
> > If you're interested in the feature, please check out the PR or join the
> > discussion here. Thanks!
> >
> > --EM
> >
>

Reply via email to