Hello Peter,
Thank you for the feedback.

IIUC, you mean to say an interpretation, could be a dummy file which would
in worst case simply not exist ? sure i believe we can be explicit there to
avoid this.
Note: this is predating this proposal though and happy to take a stab in
being explicit here.

> users were required to have direct read access to the metadata files in
order to plan queries on the table. That implied an access requirement,
even though it was never explicitly documented

while the requirement is true but it's not like every user would get
credentials to do so, it was strictly based on if the user is authorized to
read the table based on the privileges defined in the catalog, loadTable's
credential was optional meaning if a catalog wants it could very well not
vend any credentials despite the client
sending  X-Iceberg-Access-Delegation due to this [1]  and hence they can
cut off any client if they want to. I believe the flexibility
is there because we don't define authorization in IRC spec. As i said the
admin is the one who had given the access to storage to the catalog in the
first place so it can very well revoke that access to storage and migrate
if the catalog is misbehaving by calling every table to itself to do
planning and can move to a different catalog if the culprit catalog doesn't
fix it.

> Maybe we add a sentence in the spec to enforce that there should be some
users where the catalog MUST provide access to the metadata files.

Regarding the original feedback, there will always be an ADMIN user who has
configured the catalog in the first place with the storage permission (lets
say proving the IAM and establishing the trust relationship) who can get
hold of the storage directly and access those metadata files directly from
storage. So some are implicit in that sense.

I believe by introducing CATALOG only mode for planning on existing
assumptions we are not introducing new ways to trap end users in getting
into vendor lock-in and like always existed a user has a way to walk out of
it with the constructs.

Please let me know what WDYT is considering above ?

[1]
https://github.com/apache/iceberg/blob/fc434997fbc63a3f1f47481c0878073b1ccf6359/open-api/rest-catalog-open-api.yaml#L1886-L1887

Best,
Prashant Singh

On Tue, Jan 13, 2026 at 6:11 AM Péter Váry <[email protected]>
wrote:

> Hi Prashant,
>
> The specification states:
>
>> The corresponding file location of table metadata should be returned in
>> the `metadata-location` field
>
>
> However, it does not specify that this location must be readable by any
> users. (Perhaps this is something we should revisit and clarify going
> forward.)
>
> Before the introduction of CATALOG_ONLY tables, users were required to
> have direct read access to the metadata files in order to plan queries on
> the table. That implied an access requirement, even though it was never
> explicitly documented. With the introduction of CATALOG_ONLY, this implicit
> requirement no longer applies, and we currently do not have an explicit
> requirement defined in the specification either.
>
> Prashant Singh <[email protected]> ezt írta (időpont: 2026. jan.
> 12., H, 23:33):
>
>> Thank you for the feedback everyone !
>>
>> Eduard : I am open to being it named _ENFORCED or even not having _ONLY
>> or _ENFORCED in the first place as Dan suggested here, please let me know
>> if you are ok with that as per [1]
>>
>> Amogh : Thank you for the feedback on the _preference mode, i tried to
>> document some concrete use cases that could benefit with it [2] as I
>> believe it can provide some options for the catalog and client to negotiate
>> when they are open to it please let me know wdyt ?
>>
>> Peter : I believe such kind of vendor locking would not be possible to
>> have since the model we are going after i.e in the loadTable itself we get
>> back the metadata pointer which is self describing and can be used to
>> register this table in the new catalog, also the way the catalog (irc)
>> specially has been laid out it decouple compute from storage
>> so in the end it's the Admin user of the catalog which has given the
>> catalog admin cred which gets scoped down based on the grants it had to the
>> catalog defined and the ADMIN can simply revoke the catalog from doing it
>> or can configure a new catalog with a different admin storage creds.
>> I tried elaborating more on this on the PR feedback too [3] please let me
>> know what wdyt ?
>>
>> I will be on top of both the PR and thread moving forward ! Appreciate
>> all your feedback.
>>
>> [1] https://github.com/apache/iceberg/pull/14867#discussion_r2673087002
>> [2] https://github.com/apache/iceberg/pull/14867#discussion_r2678941794
>> [3] https://github.com/apache/iceberg/pull/14867#discussion_r2678376025
>>
>> Best,
>> Prashant Singh
>>
>> On Fri, Jan 9, 2026 at 10:34 PM Péter Váry <[email protected]>
>> wrote:
>>
>>> I have a concern about some catalogs starting to make every table
>>> `CATALOG_ONLY`, which would essentially lock users to the catalog without
>>> providing a way to migrate the data to another catalog.
>>> Maybe we add a sentence in the spec to enforce, that there should be
>>> some users where the catalog MUST provide access to the metadata files.
>>>
>>> WDYT?
>>>
>>> On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar <[email protected]> wrote:
>>>
>>>> I did a pass over PR but I guess I'm a little skeptical on what notion
>>>> of "preferences" truly gets us in the protocol. In case the endpoint is
>>>> available but not enforced, my mental model is to just let the client make
>>>> whatever choice it wants. If a server really thinks it's advantageous to
>>>> use the remote planning, I'd think it'd just say server side planning is
>>>> enforced. For the "momentary load" case, all a client would need to do is
>>>> just handle the server throttling and fallback to a client side planning
>>>> (don't think the protocol needs to expand just for that).
>>>>
>>>> On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer <
>>>> [email protected]> wrote:
>>>>
>>>>> I'm in agreement with Prashsant's current plan, I have no preference
>>>>> on naming of Only vs Enforced"
>>>>>
>>>>> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Instead of calling it "ONLY", maybe "ENFORCED" would be a better
>>>>>> term? I think that would more naturally express the behavior without 
>>>>>> having
>>>>>> to define what "ONLY" really means.
>>>>>>
>>>>>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> *Hi everyone,*
>>>>>>>
>>>>>>> *JB:* Mostly yes, but it's more about what the server wants the
>>>>>>> client to do. The server can indicate if it supports a mode or not via 
>>>>>>> the
>>>>>>> /v1/config endpoint at this point.
>>>>>>>
>>>>>>> *Russell:* Thank you for the thorough feedback! I think it is a
>>>>>>> great idea to break the optional mode into *Prefer Client | Prefer
>>>>>>> Catalog*—it really opens up a lot of interesting use cases.
>>>>>>>
>>>>>>> For example, the server might support planning but, due to momentary
>>>>>>> load, wants the client to see if it's open to planning on the client 
>>>>>>> side.
>>>>>>> Similarly, an argument can be made that if the server has a table 
>>>>>>> cached in
>>>>>>> memory, it would prefer the client comes to the server. Earlier, with 
>>>>>>> just
>>>>>>> the optional value, we were simply falling back to server or client side
>>>>>>> planning based on whether the server supported scan planning. Now, the
>>>>>>> client can express its own overrides via catalog configs as well.
>>>>>>>
>>>>>>> Based on our offline discussion, I have incorporated the feedback
>>>>>>> into the updated matrix [1] to document what the planning modes would be
>>>>>>> based on the server response and client overrides:
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    *CLIENT_ONLY + CATALOG_ONLY* = FAIL
>>>>>>>    -
>>>>>>>
>>>>>>>    *One "ONLY" + opposite "PREFERRED"* = ONLY wins
>>>>>>>    -
>>>>>>>
>>>>>>>    *Both "PREFERRED"* = Client config wins
>>>>>>>    -
>>>>>>>
>>>>>>>    *Client not configured* = Use server config or default
>>>>>>>
>>>>>>> I will update the reference implementation soon based on this. I
>>>>>>> would love to know what other folks think!
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Prashant Singh
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832
>>>>>>>
>>>>>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I can imagine one more
>>>>>>>>
>>>>>>>>
>>>>>>>> (None - I would rename this) ClientOnly - Client can use Catalog
>>>>>>>> Planning or Local Planning
>>>>>>>>
>>>>>>>> PreferClient - Client should use local planning, but the plan api
>>>>>>>> is available for this table — I can only imagine this would be useful 
>>>>>>>> for a
>>>>>>>> scenario where most clients are heavy and have the resources to do 
>>>>>>>> local
>>>>>>>> planning (or engine distributed planning) but you still want to support
>>>>>>>> lightweight clients which can’t really do planning themselves.
>>>>>>>>
>>>>>>>> PreferCatalog - Client should use the plan API, but credentials
>>>>>>>> have been provided to enable local planning — This is probably a
>>>>>>>> transitional state as we move from clients that only support local 
>>>>>>>> planning
>>>>>>>> to those which can use the plan api.
>>>>>>>>
>>>>>>>> CatalogOnly - Clients are not provided with the credentials
>>>>>>>> required to read the table from the Metadata.json alone. If they do not
>>>>>>>> implement the scan plan API they should fail fast, otherwise they will 
>>>>>>>> fail
>>>>>>>> when they attempt to load a manifest_list file — This is used in
>>>>>>>> circumstances where the catalog is giving either file specific 
>>>>>>>> credentials
>>>>>>>> or is protecting the delivered files in some way such that their 
>>>>>>>> contents
>>>>>>>> has been specially redacted or something like that.
>>>>>>>>
>>>>>>>>
>>>>>>>> I assume most catalogs will start with “ClientOnly” or “None”
>>>>>>>>
>>>>>>>> Then as Catalogs being to support planning API we will see most
>>>>>>>> tables move to
>>>>>>>> PreferCatalog with some perhaps extremely heavy or large tables
>>>>>>>> staying as PreferClient or Client Only.
>>>>>>>>
>>>>>>>> Then catalogs with special protections may have some tables return
>>>>>>>>  CatalogOnly so they can either scope credentials more tightly or
>>>>>>>> manipulate the files that the client actually has access to in some 
>>>>>>>> way.
>>>>>>>>
>>>>>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Prashant
>>>>>>>>>
>>>>>>>>> It makes sense to me. I guess we are using Catalog properties to
>>>>>>>>> indicate what the REST server supports to the client, right ?
>>>>>>>>> I will take a look at the PR, but I like the idea.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey All,
>>>>>>>>>>
>>>>>>>>>> I wanted to bring up the discussion of introducing a concept of
>>>>>>>>>> rest scan planning mode which would help the server to instruct the 
>>>>>>>>>> client
>>>>>>>>>> on how to plan the table via loadTableResponse or config at table 
>>>>>>>>>> level
>>>>>>>>>> override.
>>>>>>>>>> There are three possible values which one could think of :
>>>>>>>>>> 1. *None* : i.e plan it on the client side, this may be the
>>>>>>>>>> table is too small and the additional rest request would add more 
>>>>>>>>>> overhead
>>>>>>>>>> than benefit.
>>>>>>>>>> 2. *Optional* : client can choose to plan it either locally or
>>>>>>>>>> can trigger server side planning.
>>>>>>>>>> 3. *Required* : client MUST do server side planning, the server
>>>>>>>>>> could suggest this if it has better indexed the iceberg metadata or 
>>>>>>>>>> client
>>>>>>>>>> is running on low resources or the table is protected. Server MAY 
>>>>>>>>>> choose
>>>>>>>>>> whatever way required to enforce the client cant bypass this for 
>>>>>>>>>> example
>>>>>>>>>> let's say don't vend cred as part of loadTable and only mint it part 
>>>>>>>>>> of
>>>>>>>>>> planning completion this would mean if the client doesn't call plan 
>>>>>>>>>> table .
>>>>>>>>>>
>>>>>>>>>> I proactively have created a pull request [1], would love to know
>>>>>>>>>> all your feedback either here or in the PR directly !
>>>>>>>>>>
>>>>>>>>>> Wish you all a very happy Holidays, it has been great working
>>>>>>>>>> with you all.
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/14867
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Prashant Singh
>>>>>>>>>>
>>>>>>>>>

Reply via email to