Hi Thanks Eric. We quickly discussed together about metadata json in London, so it’s aligned.
If the caching makes sense, I wonder for the persistence layer. Maybe we should clearly state who is responsible of what. Overall it looks a great idea. Regards JB Le ven. 23 mai 2025 à 23:52, Yufei Gu <flyrain...@gmail.com> a écrit : > Thanks for doing this, Eric! It will boost performance a lot for tables > with reasonable size metadata.json files. We also automatically get an > in-memory cache since the Polaris entity is cached by default. > Agreed to defer any separated caching mechanism so that we don't have to > care about consistency issues. > > Yufei > > > On Fri, May 23, 2025 at 7:57 AM Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > > > Hi all, > > > > Some time ago I opened this PR < > https://github.com/apache/polaris/pull/433 > > > > > which proposes to store/cache TableMetadata in the Polaris metastore, > > avoiding a trip to object storage in many cases. Based on this recent > > comment < > > https://github.com/apache/polaris/pull/433#issuecomment-2904298967> I > > wanted to start up a mailing list thread for discussion about this > feature > > as it might be a little hard to follow comment threads on what is now a > > very old PR. > > > > The proposal is, in a nutshell, to add a new internal property > > metadata-cache-content to IcebergTableLikeEntity's internal properties > and > > to use that to store the exact contents of a table's metadata.json. The > > content can be updated whenever the metadata.json is read and can be > > configured to happen only for metadata.json files below some approximate > > size. > > > > I recently used the benchmark suite proposed in this PR > > <https://github.com/apache/polaris-tools/pull/21> to measure the impact > of > > the change and found it to dramatically improve loadTable performance. > > > > Some things that have been brought up which are *not* in scope for this > PR: > > 1. Directly loading the metadata.json content into a LoadTableResponse > > without building an in-memory TableMetadata object was previously in the > PR > > but removed after this comment > > <https://github.com/apache/polaris/pull/433#issuecomment-2885074219> > from > > Russell; it's planned as a followup. > > 2. Storing individual parts of table metadata.json in persistence, i.e. > > just the schema. We can do this if a use case arises, but being able to > > store whole table metadata is beneficial immediately. > > 3. A separate entity for table metadata. Because we add the table > metadata > > to IcebergTableLikeEntity we immediately benefit from the entity cache > and > > don't have to worry too much about consistency. > > 4. A separate cache for table metadata. Similar to the above, this would > > make handling consistency more complicated. Having a separate cache, > maybe > > with its own size or TTL configurations, just for table metadata could > be a > > good followup but it's not necessary to make things work. > > > > This is a feature that has the potential to deliver tremendous latency > > benefits and one that opens up several interesting possibilities for > > followup improvements. > > > > If you're interested in the feature, please check out the PR or join the > > discussion here. Thanks! > > > > --EM > > >