Thanks Yufei! I think it's worth thinking through if it makes sense to leverage Plan/Preplan APIs like Jack alluded to. I think this makes sense from a scale argument, since in the worst case the Plan/Preplan APIs need to be able to churn through all the metadata anyways. However, with this approach we probably want to think through the API modeling because currently Plan/Preplan is designed around data file/delete file scan tasks. It sounds theoretically possible but at least to me it's not obvious if it'll make for a good API to work with.
Overall though, like the general direction of this. Thanks, Amogh Jahagirdar On Thu, Jul 4, 2024 at 4:10 AM Robert Stupp <sn...@snazy.de> wrote: > Hi Yufei, > > I think the proposal is very interesting! The direction this and other > proposals are going is IMO the right one. > > Since many proposals need access to at least manifest-lists and manifest > files, potentially also data/delete files, does it make sense to bundle all > proposals that need this ability? > > Robert > On 03.07.24 22:44, Yufei Gu wrote: > > Hi folks, > > I'd like to discuss a new proposal to support server-side metadata tables. > > One of Iceberg's most advantageous features is the ability to inspect a > table using metadata tables. For instance, we can query snapshots just like > we query data rows using the following command: SELECT * FROM > prod.db.table.snapshots; > > With the REST catalog, we can simplify this process further by providing > metadata directly from REST endpoints. Here are several benefits of this > approach: > > - Engine Independence: The metadata tables do not rely on a specific > implementation of an engine. The REST server returns the results directly. > For example, the Rust Iceberg does not need to implement its own logic to > query the snapshot table if it connects to a server with this capability. > This reduces the complexity and development effort required for different > clients and engines. > - Enabled New Use Cases: A catalog UI or Lakehouse UI can present a > table's metadata (e.g., snapshot/partition list) without relying on an > engine like Trino. This opens up possibilities for lightweight UIs and > tools that can directly interact with the REST endpoints to retrieve and > display metadata. > - Enhanced Performance: With server-side caching, the server-side > metadata tables will perform better. Caching reduces the need to repeatedly > compute or retrieve metadata, leading to faster response times and reduced > load on the underlying storage systems. > > Here is the proposal in google doc: > https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit?usp=sharing > > Estimated read time: 5 mins > > Would really appreciate any feedback on this topic and proposal! > > > Yufei > > -- > Robert Stupp > @snazy > >