Thanks Yufei!

I think it's worth thinking through if it makes sense to leverage
Plan/Preplan APIs like Jack alluded to. I think this makes sense from a
scale argument, since in the worst case the Plan/Preplan APIs need to be
able to churn through all the metadata anyways. However, with this approach
we probably want to think through the API modeling because currently
Plan/Preplan is designed around data file/delete file scan tasks. It sounds
theoretically possible but at least to me it's not obvious if it'll make
for a good API to work with.

Overall though, like the general direction of this.

Thanks,

Amogh Jahagirdar

On Thu, Jul 4, 2024 at 4:10 AM Robert Stupp <sn...@snazy.de> wrote:

> Hi Yufei,
>
> I think the proposal is very interesting! The direction this and other
> proposals are going is IMO the right one.
>
> Since many proposals need access to at least manifest-lists and manifest
> files, potentially also data/delete files, does it make sense to bundle all
> proposals that need this ability?
>
> Robert
> On 03.07.24 22:44, Yufei Gu wrote:
>
> Hi folks,
>
> I'd like to discuss a new proposal to support server-side metadata tables.
>
> One of Iceberg's most advantageous features is the ability to inspect a
> table using metadata tables. For instance, we can query snapshots just like
> we query data rows using the following command: SELECT * FROM
> prod.db.table.snapshots;
>
> With the REST catalog, we can simplify this process further by providing
> metadata directly from REST endpoints. Here are several benefits of this
> approach:
>
>    - Engine Independence: The metadata tables do not rely on a specific
>    implementation of an engine. The REST server returns the results directly.
>    For example, the Rust Iceberg does not need to implement its own logic to
>    query the snapshot table if it connects to a server with this capability.
>    This reduces the complexity and development effort required for different
>    clients and engines.
>    - Enabled New Use Cases: A catalog UI or Lakehouse UI can present a
>    table's metadata (e.g., snapshot/partition list) without relying on an
>    engine like Trino. This opens up possibilities for lightweight UIs and
>    tools that can directly interact with the REST endpoints to retrieve and
>    display metadata.
>    - Enhanced Performance: With server-side caching, the server-side
>    metadata tables will perform better. Caching reduces the need to repeatedly
>    compute or retrieve metadata, leading to faster response times and reduced
>    load on the underlying storage systems.
>
> Here is the proposal in google doc:
> https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit?usp=sharing
>
> Estimated read time: 5 mins
>
> Would really appreciate any feedback on this topic and proposal!
>
>
> Yufei
>
> --
> Robert Stupp
> @snazy
>
>

Reply via email to