Hi Dan, > While it is possible and may feel like it would prevent interoperability, that would be easily circumvented by just copying the entire contents of the table through scan/plan.
This enables the user to recreate a snapshot of the table, but it does not provide the full history or complete table metadata. It is also significantly more involved than simply calling the register table operation. > REST Catalog implementations have always been able to restrict access to physical storage regardless of whether a client could load the table metadata or not. Previously, this was primarily a matter of gaining access to the underlying storage. With the introduction of CATALOG_ONLY tables, storing Iceberg metadata files is no longer required for any operation. > there are lots of different ways closed systems can restrict access already (e.g. jdbc only or proprietary APIs), so I don't feel like this is changing that dynamic. I’m not sure I understand this. Could you please provide more details? The goal, as I understand it, is that if a Catalog implements the Iceberg specification, migration to and from this Catalog should be possible with any other Catalog that adheres to the same specification. Introducing CATALOG_ONLY tables, however, feels like another step away from interoperability. > I think the motivation behind catalog only mode is more for cases where the underlying data is either in a different representation or is being adapted on-the-fly. For example, if you wanted to expose a table from a database that can export data to parquet, but doesn't natively support Iceberg as a format, you can hide that behind scan plan interfaces. Using the Scan Planning interface has been optional until now, but with the introduction of CATALOG_ONLY tables, it becomes mandatory. As a result, compliant engines will need to implement it. > There may not be a full representation of the table metadata but using a subset of Iceberg primitives, you can still achieve interoperability (at least for read). In earlier discussions, we agreed that tables should not implement only a subset of the Iceberg specification. This proposal seems to move in a different direction. While I’m not opposed to the feature and recognize the benefits of integrating non-Iceberg tables into Iceberg catalogs and making them queryable by compatible engines, I believe it would be useful to clarify our current understanding of the boundaries and the level of feature parity we aim to maintain. Establishing this would provide a consistent framework for evaluating similar proposals going forward. This seems like a good candidate for today’s catalog sync discussion. Thanks, Peter Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan. 14., Sze, 0:23): > I don't feel we should be too concerned about catalogs switching to a > "catalog only" mode and not providing direct access. While it is possible > and may feel like it would prevent interoperability, that would be > easily circumvented by just copying the entire contents of the table > through scan/plan. I wouldn't agree there was implied access just by > having a metadata-location field either. REST Catalog implementations have > always been able to restrict access to physical storage regardless of > whether a client could load the table metadata or not. I understand the > concern about lock-in, but there are lots of different ways closed systems > can restrict access already (e.g. jdbc only or proprietary APIs), so I > don't feel like this is changing that dynamic. > > I think the motivation behind catalog only mode is more for cases where > the underlying data is either in a different representation or is being > adapted on-the-fly. For example, if you wanted to expose a table from a > database that can export data to parquet, but doesn't natively support > Iceberg as a format, you can hide that behind scan plan interfaces. There > may not be a full representation of the table metadata but using a subset > of Iceberg primitives, you can still achieve interoperability (at least for > read). > > Introducing modes just is a way to express the intent/availability for the > scan plan and coordinate between the client and server, but I don't think > it really affects whether a client could be prevented from reading table > data directly (a catalog can do that regardless). > > I would add that I don't think the spec should include anything about the > client modes (I added a comment to the PR on this). The spec should only > indicate what the server can return and what the expectations should be for > a client. What a client implements and what configurations it exposes is > more of a client-side implementation detail and should not be part of the > spec. > > > -Dan > > > On Tue, Jan 13, 2026 at 11:07 AM Prashant Singh <[email protected]> > wrote: > >> Hello Peter, >> Thank you for the feedback. >> >> IIUC, you mean to say an interpretation, could be a dummy file which >> would in worst case simply not exist ? sure i believe we can be explicit >> there to avoid this. >> Note: this is predating this proposal though and happy to take a stab in >> being explicit here. >> >> > users were required to have direct read access to the metadata files in >> order to plan queries on the table. That implied an access requirement, >> even though it was never explicitly documented >> >> while the requirement is true but it's not like every user would get >> credentials to do so, it was strictly based on if the user is authorized to >> read the table based on the privileges defined in the catalog, loadTable's >> credential was optional meaning if a catalog wants it could very well not >> vend any credentials despite the client >> sending X-Iceberg-Access-Delegation due to this [1] and hence they can >> cut off any client if they want to. I believe the flexibility >> is there because we don't define authorization in IRC spec. As i said the >> admin is the one who had given the access to storage to the catalog in the >> first place so it can very well revoke that access to storage and migrate >> if the catalog is misbehaving by calling every table to itself to do >> planning and can move to a different catalog if the culprit catalog doesn't >> fix it. >> >> > Maybe we add a sentence in the spec to enforce that there should be >> some users where the catalog MUST provide access to the metadata files. >> >> Regarding the original feedback, there will always be an ADMIN user who >> has configured the catalog in the first place with the storage permission >> (lets say proving the IAM and establishing the trust relationship) who can >> get hold of the storage directly and access those metadata files directly >> from storage. So some are implicit in that sense. >> >> I believe by introducing CATALOG only mode for planning on existing >> assumptions we are not introducing new ways to trap end users in getting >> into vendor lock-in and like always existed a user has a way to walk out of >> it with the constructs. >> >> Please let me know what WDYT is considering above ? >> >> [1] >> https://github.com/apache/iceberg/blob/fc434997fbc63a3f1f47481c0878073b1ccf6359/open-api/rest-catalog-open-api.yaml#L1886-L1887 >> >> Best, >> Prashant Singh >> >> On Tue, Jan 13, 2026 at 6:11 AM Péter Váry <[email protected]> >> wrote: >> >>> Hi Prashant, >>> >>> The specification states: >>> >>>> The corresponding file location of table metadata should be returned in >>>> the `metadata-location` field >>> >>> >>> However, it does not specify that this location must be readable by any >>> users. (Perhaps this is something we should revisit and clarify going >>> forward.) >>> >>> Before the introduction of CATALOG_ONLY tables, users were required to >>> have direct read access to the metadata files in order to plan queries on >>> the table. That implied an access requirement, even though it was never >>> explicitly documented. With the introduction of CATALOG_ONLY, this implicit >>> requirement no longer applies, and we currently do not have an explicit >>> requirement defined in the specification either. >>> >>> Prashant Singh <[email protected]> ezt írta (időpont: 2026. jan. >>> 12., H, 23:33): >>> >>>> Thank you for the feedback everyone ! >>>> >>>> Eduard : I am open to being it named _ENFORCED or even not having _ONLY >>>> or _ENFORCED in the first place as Dan suggested here, please let me know >>>> if you are ok with that as per [1] >>>> >>>> Amogh : Thank you for the feedback on the _preference mode, i tried to >>>> document some concrete use cases that could benefit with it [2] as I >>>> believe it can provide some options for the catalog and client to negotiate >>>> when they are open to it please let me know wdyt ? >>>> >>>> Peter : I believe such kind of vendor locking would not be possible to >>>> have since the model we are going after i.e in the loadTable itself we get >>>> back the metadata pointer which is self describing and can be used to >>>> register this table in the new catalog, also the way the catalog (irc) >>>> specially has been laid out it decouple compute from storage >>>> so in the end it's the Admin user of the catalog which has given the >>>> catalog admin cred which gets scoped down based on the grants it had to the >>>> catalog defined and the ADMIN can simply revoke the catalog from doing it >>>> or can configure a new catalog with a different admin storage creds. >>>> I tried elaborating more on this on the PR feedback too [3] please let >>>> me know what wdyt ? >>>> >>>> I will be on top of both the PR and thread moving forward ! Appreciate >>>> all your feedback. >>>> >>>> [1] https://github.com/apache/iceberg/pull/14867#discussion_r2673087002 >>>> [2] https://github.com/apache/iceberg/pull/14867#discussion_r2678941794 >>>> [3] https://github.com/apache/iceberg/pull/14867#discussion_r2678376025 >>>> >>>> Best, >>>> Prashant Singh >>>> >>>> On Fri, Jan 9, 2026 at 10:34 PM Péter Váry <[email protected]> >>>> wrote: >>>> >>>>> I have a concern about some catalogs starting to make every table >>>>> `CATALOG_ONLY`, which would essentially lock users to the catalog without >>>>> providing a way to migrate the data to another catalog. >>>>> Maybe we add a sentence in the spec to enforce, that there should be >>>>> some users where the catalog MUST provide access to the metadata files. >>>>> >>>>> WDYT? >>>>> >>>>> On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar <[email protected]> wrote: >>>>> >>>>>> I did a pass over PR but I guess I'm a little skeptical on what >>>>>> notion of "preferences" truly gets us in the protocol. In case the >>>>>> endpoint >>>>>> is available but not enforced, my mental model is to just let the client >>>>>> make whatever choice it wants. If a server really thinks it's >>>>>> advantageous >>>>>> to use the remote planning, I'd think it'd just say server side planning >>>>>> is >>>>>> enforced. For the "momentary load" case, all a client would need to do is >>>>>> just handle the server throttling and fallback to a client side planning >>>>>> (don't think the protocol needs to expand just for that). >>>>>> >>>>>> On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I'm in agreement with Prashsant's current plan, I have no preference >>>>>>> on naming of Only vs Enforced" >>>>>>> >>>>>>> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Instead of calling it "ONLY", maybe "ENFORCED" would be a better >>>>>>>> term? I think that would more naturally express the behavior without >>>>>>>> having >>>>>>>> to define what "ONLY" really means. >>>>>>>> >>>>>>>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> *Hi everyone,* >>>>>>>>> >>>>>>>>> *JB:* Mostly yes, but it's more about what the server wants the >>>>>>>>> client to do. The server can indicate if it supports a mode or not >>>>>>>>> via the >>>>>>>>> /v1/config endpoint at this point. >>>>>>>>> >>>>>>>>> *Russell:* Thank you for the thorough feedback! I think it is a >>>>>>>>> great idea to break the optional mode into *Prefer Client | >>>>>>>>> Prefer Catalog*—it really opens up a lot of interesting use cases. >>>>>>>>> >>>>>>>>> For example, the server might support planning but, due to >>>>>>>>> momentary load, wants the client to see if it's open to planning on >>>>>>>>> the >>>>>>>>> client side. Similarly, an argument can be made that if the server >>>>>>>>> has a >>>>>>>>> table cached in memory, it would prefer the client comes to the >>>>>>>>> server. >>>>>>>>> Earlier, with just the optional value, we were simply falling back to >>>>>>>>> server or client side planning based on whether the server supported >>>>>>>>> scan >>>>>>>>> planning. Now, the client can express its own overrides via catalog >>>>>>>>> configs >>>>>>>>> as well. >>>>>>>>> >>>>>>>>> Based on our offline discussion, I have incorporated the feedback >>>>>>>>> into the updated matrix [1] to document what the planning modes would >>>>>>>>> be >>>>>>>>> based on the server response and client overrides: >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> *CLIENT_ONLY + CATALOG_ONLY* = FAIL >>>>>>>>> - >>>>>>>>> >>>>>>>>> *One "ONLY" + opposite "PREFERRED"* = ONLY wins >>>>>>>>> - >>>>>>>>> >>>>>>>>> *Both "PREFERRED"* = Client config wins >>>>>>>>> - >>>>>>>>> >>>>>>>>> *Client not configured* = Use server config or default >>>>>>>>> >>>>>>>>> I will update the reference implementation soon based on this. I >>>>>>>>> would love to know what other folks think! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Prashant Singh >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832 >>>>>>>>> >>>>>>>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I can imagine one more >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (None - I would rename this) ClientOnly - Client can use Catalog >>>>>>>>>> Planning or Local Planning >>>>>>>>>> >>>>>>>>>> PreferClient - Client should use local planning, but the plan api >>>>>>>>>> is available for this table — I can only imagine this would be >>>>>>>>>> useful for a >>>>>>>>>> scenario where most clients are heavy and have the resources to do >>>>>>>>>> local >>>>>>>>>> planning (or engine distributed planning) but you still want to >>>>>>>>>> support >>>>>>>>>> lightweight clients which can’t really do planning themselves. >>>>>>>>>> >>>>>>>>>> PreferCatalog - Client should use the plan API, but credentials >>>>>>>>>> have been provided to enable local planning — This is probably a >>>>>>>>>> transitional state as we move from clients that only support local >>>>>>>>>> planning >>>>>>>>>> to those which can use the plan api. >>>>>>>>>> >>>>>>>>>> CatalogOnly - Clients are not provided with the credentials >>>>>>>>>> required to read the table from the Metadata.json alone. If they do >>>>>>>>>> not >>>>>>>>>> implement the scan plan API they should fail fast, otherwise they >>>>>>>>>> will fail >>>>>>>>>> when they attempt to load a manifest_list file — This is used in >>>>>>>>>> circumstances where the catalog is giving either file specific >>>>>>>>>> credentials >>>>>>>>>> or is protecting the delivered files in some way such that their >>>>>>>>>> contents >>>>>>>>>> has been specially redacted or something like that. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I assume most catalogs will start with “ClientOnly” or “None” >>>>>>>>>> >>>>>>>>>> Then as Catalogs being to support planning API we will see most >>>>>>>>>> tables move to >>>>>>>>>> PreferCatalog with some perhaps extremely heavy or large tables >>>>>>>>>> staying as PreferClient or Client Only. >>>>>>>>>> >>>>>>>>>> Then catalogs with special protections may have some tables >>>>>>>>>> return CatalogOnly so they can either scope credentials more >>>>>>>>>> tightly or >>>>>>>>>> manipulate the files that the client actually has access to in some >>>>>>>>>> way. >>>>>>>>>> >>>>>>>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Prashant >>>>>>>>>>> >>>>>>>>>>> It makes sense to me. I guess we are using Catalog properties to >>>>>>>>>>> indicate what the REST server supports to the client, right ? >>>>>>>>>>> I will take a look at the PR, but I like the idea. >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> JB >>>>>>>>>>> >>>>>>>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey All, >>>>>>>>>>>> >>>>>>>>>>>> I wanted to bring up the discussion of introducing a concept of >>>>>>>>>>>> rest scan planning mode which would help the server to instruct >>>>>>>>>>>> the client >>>>>>>>>>>> on how to plan the table via loadTableResponse or config at table >>>>>>>>>>>> level >>>>>>>>>>>> override. >>>>>>>>>>>> There are three possible values which one could think of : >>>>>>>>>>>> 1. *None* : i.e plan it on the client side, this may be the >>>>>>>>>>>> table is too small and the additional rest request would add more >>>>>>>>>>>> overhead >>>>>>>>>>>> than benefit. >>>>>>>>>>>> 2. *Optional* : client can choose to plan it either locally or >>>>>>>>>>>> can trigger server side planning. >>>>>>>>>>>> 3. *Required* : client MUST do server side planning, the >>>>>>>>>>>> server could suggest this if it has better indexed the iceberg >>>>>>>>>>>> metadata or >>>>>>>>>>>> client is running on low resources or the table is protected. >>>>>>>>>>>> Server MAY >>>>>>>>>>>> choose whatever way required to enforce the client cant bypass >>>>>>>>>>>> this for >>>>>>>>>>>> example let's say don't vend cred as part of loadTable and only >>>>>>>>>>>> mint it >>>>>>>>>>>> part of planning completion this would mean if the client doesn't >>>>>>>>>>>> call plan >>>>>>>>>>>> table . >>>>>>>>>>>> >>>>>>>>>>>> I proactively have created a pull request [1], would love to >>>>>>>>>>>> know all your feedback either here or in the PR directly ! >>>>>>>>>>>> >>>>>>>>>>>> Wish you all a very happy Holidays, it has been great working >>>>>>>>>>>> with you all. >>>>>>>>>>>> >>>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/14867 >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Prashant Singh >>>>>>>>>>>> >>>>>>>>>>>
