Re: [DISCUSS] REST: Scan Planning mode

Péter Váry Wed, 04 Feb 2026 01:19:45 -0800

> I'm a little concerned about using the REST spec as a means to force
portability on implementations.  I feel that level of requirement could
result in a reluctance to provide interoperability which would limit access
to data or normalize non-compliance with the spec.  Ultimately, I feel user
demand will drive the goals of openness and portability, which is a trend
we see across the ecosystem and continues to drive interest in open formats
and standards.


If we feel strongly about this, we could define the deregister/export
operation as an optional endpoint. Catalog implementations could choose
whether to support it, allowing users to make informed decisions based on
feature availability when selecting a catalog. Once the feature becomes
broadly adopted, we could move the endpoint into the set of required table
endpoints.


Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan. 28., Sze,
20:38):

> I think there's good reason to consider a "deregister" or "export" like
> functionality given that there isn't a clear path to hand off ownership of
> a table between catalogs. This is a slightly different motivation for
> similar functionality, but shares the same underlying goal of improving
> portability.
>
> Even without this, there are ways to capture the metadata (e.g. persist
> the json response and use that as the metadata reference for registering),
> so I don't think the absence of a physical json file is really a blocker.
> We originally wanted to preserve the physical representation to both adhere
> to the spec language regarding how commits are effected and to ensure
> access for older clients that do not support the REST Catalog.  At this
> point, REST support is nearing ubiquity and the metadata representation is
> still available in some form (though less convenient for direct file
> reference).
>
> I'm a little concerned about using the REST spec as a means to force
> portability on implementations.  I feel that level of requirement could
> result in a reluctance to provide interoperability which would limit access
> to data or normalize non-compliance with the spec.  Ultimately, I feel user
> demand will drive the goals of openness and portability, which is a trend
> we see across the ecosystem and continues to drive interest in open formats
> and standards.
>
> -Dan
>
> On Wed, Jan 28, 2026 at 7:55 AM Russell Spitzer <[email protected]>
> wrote:
>
>> Prior to the introduction of CATALOG_ONLY tables, reading a table
>>> implicitly required that the full table metadata be accessible to readers.
>>> This made it possible to migrate a table between catalog implementations by
>>> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the
>>> existing metadata.json, assuming the appropriate user privileges were in
>>> place.
>>
>>
>> This actually hasn’t been the case for quite a while across several
>> vendors (though not the one I work at — we still expose full metadata).
>> There’s nothing preventing, and in fact several vendors are already,
>> shipping Iceberg metadata that does not strictly represent the table.
>> Properties, snapshots, or even the table itself can redirect to another
>> representation of the same table, leaving no way to recover a true “ground
>> truth” view via the REST API. I’m also aware of folks shipping different
>> versions of the metadata or exposing what is essentially a read-only
>> metadata.json layered on top of a table in another format. So I think
>> the ship has largely sailed on relying on metadata as a guaranteed
>> canonical view.
>>
>> I do think it’s still important to preserve *portability*, or at least
>> to make it clear to end users whether or not their tables will be portable.
>> With that in mind, I was wondering if we should introduce an explicit
>> catalog export command that is essentially the inverse of register.
>> Unlike loadTable, it would be required to produce the path of a
>> metadata.json that represents the entire Iceberg table without modification.
>>
>> That would give catalogs a clear way to signal whether they support
>> “unregistering” a table in a way that lets it be used in another system. We
>> could also scope permissions for this functionality so that only specific
>> users are allowed to perform an export.
>>
>>
>>
>> On Wed, Jan 28, 2026 at 5:42 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> > I am not sure about the concern for lock-in. Users are free to adopt
>>> any catalog that is spec compliant. Catalog-only tables are not the choices
>>> of the catalog vendor/provider, it is the choice of the table owner by
>>> users for access control.
>>>
>>> Prior to the introduction of CATALOG_ONLY tables, reading a table
>>> implicitly required that the full table metadata be accessible to readers.
>>> This made it possible to migrate a table between catalog implementations by
>>> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the
>>> existing metadata.json, assuming the appropriate user privileges were in
>>> place.
>>>
>>> With CATALOG_ONLY tables, this implicit requirement is removed, and no
>>> alternative requirement is introduced. As a result, migrating the complete
>>> history of a table may become impossible without performing a manual
>>> traversal of the plan(s) and metadata.
>>>
>>> What I am suggesting is that the ability to re‑register an Iceberg table
>>> with a different catalog should be an explicit requirement for a
>>> spec‑compliant catalog.
>>>
>>> > Also this proposal doesn't say that the write path shouldn't produce
>>> the metadata.json file, which is still required today to be spec compliant.
>>>
>>> The Iceberg table specification describes metadata.json and manifest
>>> files, but after this change a catalog could be fully compliant with the
>>> Iceberg REST Catalog specification while still not exposing these files in
>>> a way that is accessible to users. This would effectively prevent use cases
>>> such as migrating tables between catalogs.
>>>
>>>
>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. jan. 26., H,
>>> 20:33):
>>>
>>>> catching up on this thread.
>>>>
>>>> I am not sure about the concern for lock-in. Users are free to adopt
>>>> any catalog that is spec compliant. Catalog-only tables are not the choices
>>>> of the catalog vendor/provider, it is the choice of the table owner by
>>>> users for access control.
>>>>
>>>> Also this proposal doesn't say that the write path shouldn't produce
>>>> the metadata.json file, which is still required today to be spec compliant.
>>>> It is just that clients may not need to load the metadata.json (and
>>>> manifest list, manifest files) directly for client-side scan planning.
>>>>
>>>> I also like Dan's suggestion of not including client preference/config
>>>> in the spec.
>>>>
>>>> > I want to highlight that introducing "CATALOG_ONLY" planners
>>>> implicitly creates a new requirement for all compliant engines. Without
>>>> support for this, engines would be unable to read these new tables. This
>>>> seems like a significant change that we should call out explicitly.
>>>>
>>>> Agree with Peter that this is a significant new requirement for
>>>> engines. Iceberg libraries (Java or other languages) can probably hide it
>>>> internally in the scan planning implementation. Some engines may not use
>>>> Iceberg libraries. This would be a new requirement.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 20, 2026 at 4:55 PM Prashant Singh <
>>>> [email protected]> wrote:
>>>>
>>>>> Thank you Peter, I will go ahead and find a slot that works for most
>>>>> of the folks interested in the discussion and put it in dev calendar ~
>>>>>
>>>>> Regarding Agenda : I would request to keep the discussion contained in
>>>>> context of what does this mean to have a mode of planning like 
>>>>> catalog_only
>>>>> its use cases
>>>>> and side effects, for example READ only tables is something that can
>>>>> be done as of today, infacts folks use this in production, for example:
>>>>> tools such as Apache Xtable (incubating) or Uniform where one generates
>>>>> iceberg metadata on top of
>>>>> existing data files, having CATALOG_ONLY doesn't change much except
>>>>> the fact that now that fake metadata doesn't need to be written, but it 
>>>>> was
>>>>> fake in the first place as an iceberg client didn't generate it and 
>>>>> catalog
>>>>> is already fully capable of doing that.
>>>>>
>>>>> With that being said, I will definitely put all your suggestions on
>>>>> the agenda, let's discuss this more in depth, to understand the feedback
>>>>> better. I also wanna include the types of mode discussion. Maybe we should
>>>>> just keep client_only and catalog_only for now ? since preference is too
>>>>> much for the first phase ?
>>>>>
>>>>> Please let me circle back with concrete time, meeting links etc, i
>>>>> will post it here !
>>>>>
>>>>> Best,
>>>>> Prashant Singh
>>>>>
>>>>> On Sat, Jan 17, 2026 at 11:28 PM Péter Váry <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Prashant,
>>>>>>
>>>>>> I agree that having a dedicated sync makes a lot of sense. I’d
>>>>>> suggest the following agenda items:
>>>>>>
>>>>>> 1. *Read-only tables*
>>>>>> During the early discussions around the File Format API, I suggested
>>>>>> starting with the read path, as this would allow us to integrate new data
>>>>>> sources more quickly. At the time, there were strong objections, with the
>>>>>> argument that every Iceberg table should be fully readable and writable
>>>>>> through Iceberg in order to be considered a “real” Iceberg table. I’m
>>>>>> interested to understand whether this position has changed since then.
>>>>>>
>>>>>> 2. *Table migration*
>>>>>> I see clear benefits in generating table metadata on the fly (e.g.,
>>>>>> easier integration with fast-changing systems, stricter security models,
>>>>>> and potential performance gains). My concern is that, if we allow this
>>>>>> without constraints, a fully compliant Iceberg catalog could choose not 
>>>>>> to
>>>>>> materialize metadata at all. This would make migration to another 
>>>>>> compliant
>>>>>> Iceberg catalog much harder. Openness and easy migration are major 
>>>>>> selling
>>>>>> points of Iceberg, and I think we should continue to enforce those 
>>>>>> values.
>>>>>>
>>>>>> 3. *Engine compatibility*
>>>>>> I want to highlight that introducing "CATALOG_ONLY" planners
>>>>>> implicitly creates a new requirement for all compliant engines. Without
>>>>>> support for this, engines would be unable to read these new tables. This
>>>>>> seems like a significant change that we should call out explicitly.
>>>>>>
>>>>>> 4. *CATALOG_ONLY tables*
>>>>>> If we reach agreement on the points above, I think the decision on
>>>>>> this topic will naturally follow.
>>>>>>
>>>>>> My current perspective on these topics:
>>>>>>
>>>>>> 1. *Read-only tables*
>>>>>> I like this idea, as it would allow Iceberg catalogs to more easily
>>>>>> expose external databases such as Delta, Lance, and others. My main
>>>>>> hesitation is that I’ve proposed this before and it was strongly rejected
>>>>>> by the community.
>>>>>>
>>>>>> 2. *Table migration*
>>>>>> My concern is that we may be taking incremental steps away from
>>>>>> Iceberg’s original position of full compliance, easy migration, and broad
>>>>>> compatibility, toward a more closed, catalog-bounded model. I’d like us 
>>>>>> to
>>>>>> step back and clearly define our core values, then enforce them in the
>>>>>> specification. This could be as simple as a few sentences in the
>>>>>> "LoadTableResponse" description requiring a way (for some users) to 
>>>>>> obtain
>>>>>> the full metadata JSON along with the corresponding manifest and data
>>>>>> files, or perhaps a dedicated migration endpoint that allows one catalog 
>>>>>> to
>>>>>> take over a table from another.
>>>>>>
>>>>>> 3. *Engine compatibility*
>>>>>> I have the sense that this “small” enum change actually introduces a
>>>>>> fairly large new requirement for engines, and I want to make sure we
>>>>>> explicitly highlight that.
>>>>>>
>>>>>> 4. *CATALOG_ONLY tables*
>>>>>> As above, I think our answers to the earlier questions will
>>>>>> effectively determine our position here.
>>>>>>
>>>>>> Overall, I like your proposal, but in a few areas it seems to move us
>>>>>> in a different direction from what we previously agreed on. I’d like to
>>>>>> understand whether the community is aligned with this new direction.
>>>>>>
>>>>>> Thanks,
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 15, 2026, 20:34 Prashant Singh <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you for the discussion everyone,
>>>>>>> really appreciate all of you taking time !
>>>>>>>
>>>>>>> Unfortunately we were not able to discuss this in the catalog sync
>>>>>>> this week,  since we ran out of time, I was wondering if all the 
>>>>>>> interested
>>>>>>> folks would be open to a discussion.
>>>>>>> I can go ahead and request one in the iceberg calendar.
>>>>>>>
>>>>>>> Peter :
>>>>>>>
>>>>>>> > With the introduction of CATALOG_ONLY tables, storing Iceberg
>>>>>>> metadata files is no longer required for any operation
>>>>>>>
>>>>>>> I am not sure if i fully get the concern here, the client still
>>>>>>> writes the manifests and manifest lists to the tables which are given to
>>>>>>> the catalog where it creates / tracks the metadata.json, for writes we 
>>>>>>> need
>>>>>>> to have hold of these manifests specially for cases such as validating 
>>>>>>> no
>>>>>>> new data has been inserted to the table (conflict detection)
>>>>>>> please ref validateAddedDataFiles [1], this can't be achieved by
>>>>>>> scan planning at least not without breaking the existing iceberg 
>>>>>>> clients as
>>>>>>> these validations are client side based on the isolation level, which 
>>>>>>> would
>>>>>>> make these tables unusable with client if we want to write.
>>>>>>>
>>>>>>> For the tables which are read only, I am not sure if those tables
>>>>>>> are sufficient for enforcing vendor lock in, in addition to what can be
>>>>>>> achieved today, I believe this would be circumvented though if we 
>>>>>>> clarify /
>>>>>>> tighten the metadata location expectation in the spec, that it should
>>>>>>> exactly state the state of the table as committed by clients
>>>>>>> i.e it should have precise references to the manifest and manifest
>>>>>>> list that the client created ?
>>>>>>>
>>>>>>> With that being said, I request everyone interested in this thread
>>>>>>> please let me know if you all are open for a dedicated community 
>>>>>>> discussion
>>>>>>> for this, happy to brainstorm together and reach consensus.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L377
>>>>>>>
>>>>>>> Best,
>>>>>>> Prashant Singh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 14, 2026 at 7:38 AM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Dan,
>>>>>>>>
>>>>>>>> > While it is possible and may feel like it would prevent
>>>>>>>> interoperability, that would be easily circumvented by just copying the
>>>>>>>> entire contents of the table through scan/plan.
>>>>>>>>
>>>>>>>> This enables the user to recreate a snapshot of the table, but it
>>>>>>>> does not provide the full history or complete table metadata. It is 
>>>>>>>> also
>>>>>>>> significantly more involved than simply calling the register table
>>>>>>>> operation.
>>>>>>>>
>>>>>>>> > REST Catalog implementations have always been able to restrict
>>>>>>>> access to physical storage regardless of whether a client could load 
>>>>>>>> the
>>>>>>>> table metadata or not.
>>>>>>>>
>>>>>>>> Previously, this was primarily a matter of gaining access to the
>>>>>>>> underlying storage. With the introduction of CATALOG_ONLY tables, 
>>>>>>>> storing
>>>>>>>> Iceberg metadata files is no longer required for any operation.
>>>>>>>>
>>>>>>>> > there are lots of different ways closed systems can restrict
>>>>>>>> access already (e.g. jdbc only or proprietary APIs), so I don't feel 
>>>>>>>> like
>>>>>>>> this is changing that dynamic.
>>>>>>>>
>>>>>>>> I’m not sure I understand this. Could you please provide more
>>>>>>>> details?
>>>>>>>>
>>>>>>>> The goal, as I understand it, is that if a Catalog implements the
>>>>>>>> Iceberg specification, migration to and from this Catalog should be
>>>>>>>> possible with any other Catalog that adheres to the same specification.
>>>>>>>> Introducing CATALOG_ONLY tables, however, feels like another step away 
>>>>>>>> from
>>>>>>>> interoperability.
>>>>>>>>
>>>>>>>> > I think the motivation behind catalog only mode is more for cases
>>>>>>>> where the underlying data is either in a different representation or is
>>>>>>>> being adapted on-the-fly.  For example, if you wanted to expose a table
>>>>>>>> from a database that can export data to parquet, but doesn't natively
>>>>>>>> support Iceberg as a format, you can hide that behind scan plan 
>>>>>>>> interfaces.
>>>>>>>>
>>>>>>>> Using the Scan Planning interface has been optional until now, but
>>>>>>>> with the introduction of CATALOG_ONLY tables, it becomes mandatory. As 
>>>>>>>> a
>>>>>>>> result, compliant engines will need to implement it.
>>>>>>>>
>>>>>>>> > There may not be a full representation of the table metadata but
>>>>>>>> using a subset of Iceberg primitives, you can still achieve
>>>>>>>> interoperability (at least for read).
>>>>>>>>
>>>>>>>> In earlier discussions, we agreed that tables should not implement
>>>>>>>> only a subset of the Iceberg specification. This proposal seems to 
>>>>>>>> move in
>>>>>>>> a different direction. While I’m not opposed to the feature and 
>>>>>>>> recognize
>>>>>>>> the benefits of integrating non-Iceberg tables into Iceberg catalogs 
>>>>>>>> and
>>>>>>>> making them queryable by compatible engines, I believe it would be 
>>>>>>>> useful
>>>>>>>> to clarify our current understanding of the boundaries and the level of
>>>>>>>> feature parity we aim to maintain. Establishing this would provide a
>>>>>>>> consistent framework for evaluating similar proposals going forward.
>>>>>>>>
>>>>>>>> This seems like a good candidate for today’s catalog sync
>>>>>>>> discussion.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan.
>>>>>>>> 14., Sze, 0:23):
>>>>>>>>
>>>>>>>>> I don't feel we should be too concerned about catalogs switching
>>>>>>>>> to a "catalog only" mode and not providing direct access.  While it is
>>>>>>>>> possible and may feel like it would prevent interoperability, that 
>>>>>>>>> would be
>>>>>>>>> easily circumvented by just copying the entire contents of the table
>>>>>>>>> through scan/plan.  I wouldn't agree there was implied access just by
>>>>>>>>> having a metadata-location field either.  REST Catalog 
>>>>>>>>> implementations have
>>>>>>>>> always been able to restrict access to physical storage regardless of
>>>>>>>>> whether a client could load the table metadata or not.  I understand 
>>>>>>>>> the
>>>>>>>>> concern about lock-in, but there are lots of different ways closed 
>>>>>>>>> systems
>>>>>>>>> can restrict access already (e.g. jdbc only or proprietary APIs), so I
>>>>>>>>> don't feel like this is changing that dynamic.
>>>>>>>>>
>>>>>>>>> I think the motivation behind catalog only mode is more for cases
>>>>>>>>> where the underlying data is either in a different representation or 
>>>>>>>>> is
>>>>>>>>> being adapted on-the-fly.  For example, if you wanted to expose a 
>>>>>>>>> table
>>>>>>>>> from a database that can export data to parquet, but doesn't natively
>>>>>>>>> support Iceberg as a format, you can hide that behind scan plan
>>>>>>>>> interfaces.  There may not be a full representation of the table 
>>>>>>>>> metadata
>>>>>>>>> but using a subset of Iceberg primitives, you can still achieve
>>>>>>>>> interoperability (at least for read).
>>>>>>>>>
>>>>>>>>> Introducing modes just is a way to express the intent/availability
>>>>>>>>> for the scan plan and coordinate between the client and server, but I 
>>>>>>>>> don't
>>>>>>>>> think it really affects whether a client could be prevented from 
>>>>>>>>> reading
>>>>>>>>> table data directly (a catalog can do that regardless).
>>>>>>>>>
>>>>>>>>> I would add that I don't think the spec should include anything
>>>>>>>>> about the client modes (I added a comment to the PR on this).  The 
>>>>>>>>> spec
>>>>>>>>> should only indicate what the server can return and what the 
>>>>>>>>> expectations
>>>>>>>>> should be for a client.  What a client implements and what 
>>>>>>>>> configurations
>>>>>>>>> it exposes is more of a client-side implementation detail and should 
>>>>>>>>> not be
>>>>>>>>> part of the spec.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 13, 2026 at 11:07 AM Prashant Singh <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Peter,
>>>>>>>>>> Thank you for the feedback.
>>>>>>>>>>
>>>>>>>>>> IIUC, you mean to say an interpretation, could be a dummy file
>>>>>>>>>> which would in worst case simply not exist ? sure i believe we can be
>>>>>>>>>> explicit there to avoid this.
>>>>>>>>>> Note: this is predating this proposal though and happy to take a
>>>>>>>>>> stab in being explicit here.
>>>>>>>>>>
>>>>>>>>>> > users were required to have direct read access to the metadata
>>>>>>>>>> files in order to plan queries on the table. That implied an access
>>>>>>>>>> requirement, even though it was never explicitly documented
>>>>>>>>>>
>>>>>>>>>> while the requirement is true but it's not like every user would
>>>>>>>>>> get credentials to do so, it was strictly based on if the user is
>>>>>>>>>> authorized to read the table based on the privileges defined in the
>>>>>>>>>> catalog, loadTable's credential was optional meaning if a catalog 
>>>>>>>>>> wants it
>>>>>>>>>> could very well not vend any credentials despite the client
>>>>>>>>>> sending  X-Iceberg-Access-Delegation due to this [1]  and hence they 
>>>>>>>>>> can
>>>>>>>>>> cut off any client if they want to. I believe the flexibility
>>>>>>>>>> is there because we don't define authorization in IRC spec. As i
>>>>>>>>>> said the admin is the one who had given the access to storage to the
>>>>>>>>>> catalog in the first place so it can very well revoke that access to
>>>>>>>>>> storage and migrate if the catalog is misbehaving by calling every 
>>>>>>>>>> table to
>>>>>>>>>> itself to do planning and can move to a different catalog if the 
>>>>>>>>>> culprit
>>>>>>>>>> catalog doesn't fix it.
>>>>>>>>>>
>>>>>>>>>> > Maybe we add a sentence in the spec to enforce that there
>>>>>>>>>> should be some users where the catalog MUST provide access to the 
>>>>>>>>>> metadata
>>>>>>>>>> files.
>>>>>>>>>>
>>>>>>>>>> Regarding the original feedback, there will always be an ADMIN
>>>>>>>>>> user who has configured the catalog in the first place with the 
>>>>>>>>>> storage
>>>>>>>>>> permission (lets say proving the IAM and establishing the trust
>>>>>>>>>> relationship) who can get hold of the storage directly and access 
>>>>>>>>>> those
>>>>>>>>>> metadata files directly from storage. So some are implicit in that 
>>>>>>>>>> sense.
>>>>>>>>>>
>>>>>>>>>> I believe by introducing CATALOG only mode for planning on
>>>>>>>>>> existing assumptions we are not introducing new ways to trap end 
>>>>>>>>>> users in
>>>>>>>>>> getting into vendor lock-in and like always existed a user has a way 
>>>>>>>>>> to
>>>>>>>>>> walk out of it with the constructs.
>>>>>>>>>>
>>>>>>>>>> Please let me know what WDYT is considering above ?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/iceberg/blob/fc434997fbc63a3f1f47481c0878073b1ccf6359/open-api/rest-catalog-open-api.yaml#L1886-L1887
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Prashant Singh
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 13, 2026 at 6:11 AM Péter Váry <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Prashant,
>>>>>>>>>>>
>>>>>>>>>>> The specification states:
>>>>>>>>>>>
>>>>>>>>>>>> The corresponding file location of table metadata should be
>>>>>>>>>>>> returned in the `metadata-location` field
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However, it does not specify that this location must be readable
>>>>>>>>>>> by any users. (Perhaps this is something we should revisit and 
>>>>>>>>>>> clarify
>>>>>>>>>>> going forward.)
>>>>>>>>>>>
>>>>>>>>>>> Before the introduction of CATALOG_ONLY tables, users were
>>>>>>>>>>> required to have direct read access to the metadata files in order 
>>>>>>>>>>> to plan
>>>>>>>>>>> queries on the table. That implied an access requirement, even 
>>>>>>>>>>> though it
>>>>>>>>>>> was never explicitly documented. With the introduction of 
>>>>>>>>>>> CATALOG_ONLY,
>>>>>>>>>>> this implicit requirement no longer applies, and we currently do 
>>>>>>>>>>> not have
>>>>>>>>>>> an explicit requirement defined in the specification either.
>>>>>>>>>>>
>>>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont:
>>>>>>>>>>> 2026. jan. 12., H, 23:33):
>>>>>>>>>>>
>>>>>>>>>>>> Thank you for the feedback everyone !
>>>>>>>>>>>>
>>>>>>>>>>>> Eduard : I am open to being it named _ENFORCED or even not
>>>>>>>>>>>> having _ONLY or _ENFORCED in the first place as Dan suggested 
>>>>>>>>>>>> here, please
>>>>>>>>>>>> let me know if you are ok with that as per [1]
>>>>>>>>>>>>
>>>>>>>>>>>> Amogh : Thank you for the feedback on the _preference mode, i
>>>>>>>>>>>> tried to document some concrete use cases that could benefit with 
>>>>>>>>>>>> it [2] as
>>>>>>>>>>>> I believe it can provide some options for the catalog and client to
>>>>>>>>>>>> negotiate when they are open to it please let me know wdyt ?
>>>>>>>>>>>>
>>>>>>>>>>>> Peter : I believe such kind of vendor locking would not be
>>>>>>>>>>>> possible to have since the model we are going after i.e in the 
>>>>>>>>>>>> loadTable
>>>>>>>>>>>> itself we get back the metadata pointer which is self describing 
>>>>>>>>>>>> and can be
>>>>>>>>>>>> used to register this table in the new catalog, also the way the 
>>>>>>>>>>>> catalog
>>>>>>>>>>>> (irc) specially has been laid out it decouple compute from storage
>>>>>>>>>>>> so in the end it's the Admin user of the catalog which has
>>>>>>>>>>>> given the catalog admin cred which gets scoped down based on the 
>>>>>>>>>>>> grants it
>>>>>>>>>>>> had to the catalog defined and the ADMIN can simply revoke the 
>>>>>>>>>>>> catalog from
>>>>>>>>>>>> doing it or can configure a new catalog with a different admin 
>>>>>>>>>>>> storage
>>>>>>>>>>>> creds.
>>>>>>>>>>>> I tried elaborating more on this on the PR feedback too [3]
>>>>>>>>>>>> please let me know what wdyt ?
>>>>>>>>>>>>
>>>>>>>>>>>> I will be on top of both the PR and thread moving forward !
>>>>>>>>>>>> Appreciate all your feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2673087002
>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678941794
>>>>>>>>>>>> [3]
>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678376025
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 9, 2026 at 10:34 PM Péter Váry <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have a concern about some catalogs starting to make every
>>>>>>>>>>>>> table `CATALOG_ONLY`, which would essentially lock users to the 
>>>>>>>>>>>>> catalog
>>>>>>>>>>>>> without providing a way to migrate the data to another catalog.
>>>>>>>>>>>>> Maybe we add a sentence in the spec to enforce, that there
>>>>>>>>>>>>> should be some users where the catalog MUST provide access to the 
>>>>>>>>>>>>> metadata
>>>>>>>>>>>>> files.
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did a pass over PR but I guess I'm a little skeptical on
>>>>>>>>>>>>>> what notion of "preferences" truly gets us in the protocol. In 
>>>>>>>>>>>>>> case the
>>>>>>>>>>>>>> endpoint is available but not enforced, my mental model is to 
>>>>>>>>>>>>>> just let the
>>>>>>>>>>>>>> client make whatever choice it wants. If a server really thinks 
>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>> advantageous to use the remote planning, I'd think it'd just say 
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>> side planning is enforced. For the "momentary load" case, all a 
>>>>>>>>>>>>>> client
>>>>>>>>>>>>>> would need to do is just handle the server throttling and 
>>>>>>>>>>>>>> fallback to a
>>>>>>>>>>>>>> client side planning (don't think the protocol needs to expand 
>>>>>>>>>>>>>> just for
>>>>>>>>>>>>>> that).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm in agreement with Prashsant's current plan, I have no
>>>>>>>>>>>>>>> preference on naming of Only vs Enforced"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Instead of calling it "ONLY", maybe "ENFORCED" would be a
>>>>>>>>>>>>>>>> better term? I think that would more naturally express the 
>>>>>>>>>>>>>>>> behavior without
>>>>>>>>>>>>>>>> having to define what "ONLY" really means.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Hi everyone,*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *JB:* Mostly yes, but it's more about what the server
>>>>>>>>>>>>>>>>> wants the client to do. The server can indicate if it 
>>>>>>>>>>>>>>>>> supports a mode or
>>>>>>>>>>>>>>>>> not via the /v1/config endpoint at this point.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Russell:* Thank you for the thorough feedback! I think
>>>>>>>>>>>>>>>>> it is a great idea to break the optional mode into *Prefer
>>>>>>>>>>>>>>>>> Client | Prefer Catalog*—it really opens up a lot of
>>>>>>>>>>>>>>>>> interesting use cases.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For example, the server might support planning but, due to
>>>>>>>>>>>>>>>>> momentary load, wants the client to see if it's open to 
>>>>>>>>>>>>>>>>> planning on the
>>>>>>>>>>>>>>>>> client side. Similarly, an argument can be made that if the 
>>>>>>>>>>>>>>>>> server has a
>>>>>>>>>>>>>>>>> table cached in memory, it would prefer the client comes to 
>>>>>>>>>>>>>>>>> the server.
>>>>>>>>>>>>>>>>> Earlier, with just the optional value, we were simply falling 
>>>>>>>>>>>>>>>>> back to
>>>>>>>>>>>>>>>>> server or client side planning based on whether the server 
>>>>>>>>>>>>>>>>> supported scan
>>>>>>>>>>>>>>>>> planning. Now, the client can express its own overrides via 
>>>>>>>>>>>>>>>>> catalog configs
>>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Based on our offline discussion, I have incorporated the
>>>>>>>>>>>>>>>>> feedback into the updated matrix [1] to document what the 
>>>>>>>>>>>>>>>>> planning modes
>>>>>>>>>>>>>>>>> would be based on the server response and client overrides:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    *CLIENT_ONLY + CATALOG_ONLY* = FAIL
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    *One "ONLY" + opposite "PREFERRED"* = ONLY wins
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    *Both "PREFERRED"* = Client config wins
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    *Client not configured* = Use server config or default
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I will update the reference implementation soon based on
>>>>>>>>>>>>>>>>> this. I would love to know what other folks think!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I can imagine one more
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (None - I would rename this) ClientOnly - Client can use
>>>>>>>>>>>>>>>>>> Catalog Planning or Local Planning
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> PreferClient - Client should use local planning, but the
>>>>>>>>>>>>>>>>>> plan api is available for this table — I can only imagine 
>>>>>>>>>>>>>>>>>> this would be
>>>>>>>>>>>>>>>>>> useful for a scenario where most clients are heavy and have 
>>>>>>>>>>>>>>>>>> the resources
>>>>>>>>>>>>>>>>>> to do local planning (or engine distributed planning) but 
>>>>>>>>>>>>>>>>>> you still want to
>>>>>>>>>>>>>>>>>> support lightweight clients which can’t really do planning 
>>>>>>>>>>>>>>>>>> themselves.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> PreferCatalog - Client should use the plan API, but
>>>>>>>>>>>>>>>>>> credentials have been provided to enable local planning — 
>>>>>>>>>>>>>>>>>> This is probably
>>>>>>>>>>>>>>>>>> a transitional state as we move from clients that only 
>>>>>>>>>>>>>>>>>> support local
>>>>>>>>>>>>>>>>>> planning to those which can use the plan api.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> CatalogOnly - Clients are not provided with the
>>>>>>>>>>>>>>>>>> credentials required to read the table from the 
>>>>>>>>>>>>>>>>>> Metadata.json alone. If
>>>>>>>>>>>>>>>>>> they do not implement the scan plan API they should fail 
>>>>>>>>>>>>>>>>>> fast, otherwise
>>>>>>>>>>>>>>>>>> they will fail when they attempt to load a manifest_list 
>>>>>>>>>>>>>>>>>> file — This is
>>>>>>>>>>>>>>>>>> used in circumstances where the catalog is giving either 
>>>>>>>>>>>>>>>>>> file specific
>>>>>>>>>>>>>>>>>> credentials or is protecting the delivered files in some way 
>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>> their contents has been specially redacted or something like 
>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I assume most catalogs will start with “ClientOnly” or
>>>>>>>>>>>>>>>>>> “None”
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Then as Catalogs being to support planning API we will
>>>>>>>>>>>>>>>>>> see most tables move to
>>>>>>>>>>>>>>>>>> PreferCatalog with some perhaps extremely heavy or large
>>>>>>>>>>>>>>>>>> tables staying as PreferClient or Client Only.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Then catalogs with special protections may have some
>>>>>>>>>>>>>>>>>> tables return  CatalogOnly so they can either scope 
>>>>>>>>>>>>>>>>>> credentials more
>>>>>>>>>>>>>>>>>> tightly or manipulate the files that the client actually has 
>>>>>>>>>>>>>>>>>> access to in
>>>>>>>>>>>>>>>>>> some way.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Prashant
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It makes sense to me. I guess we are using Catalog
>>>>>>>>>>>>>>>>>>> properties to indicate what the REST server supports to the 
>>>>>>>>>>>>>>>>>>> client, right ?
>>>>>>>>>>>>>>>>>>> I will take a look at the PR, but I like the idea.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hey All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I wanted to bring up the discussion of introducing a
>>>>>>>>>>>>>>>>>>>> concept of rest scan planning mode which would help the 
>>>>>>>>>>>>>>>>>>>> server to instruct
>>>>>>>>>>>>>>>>>>>> the client on how to plan the table via loadTableResponse 
>>>>>>>>>>>>>>>>>>>> or config at
>>>>>>>>>>>>>>>>>>>> table level override.
>>>>>>>>>>>>>>>>>>>> There are three possible values which one could think
>>>>>>>>>>>>>>>>>>>> of :
>>>>>>>>>>>>>>>>>>>> 1. *None* : i.e plan it on the client side, this may
>>>>>>>>>>>>>>>>>>>> be the table is too small and the additional rest request 
>>>>>>>>>>>>>>>>>>>> would add more
>>>>>>>>>>>>>>>>>>>> overhead than benefit.
>>>>>>>>>>>>>>>>>>>> 2. *Optional* : client can choose to plan it either
>>>>>>>>>>>>>>>>>>>> locally or can trigger server side planning.
>>>>>>>>>>>>>>>>>>>> 3. *Required* : client MUST do server side planning,
>>>>>>>>>>>>>>>>>>>> the server could suggest this if it has better indexed the 
>>>>>>>>>>>>>>>>>>>> iceberg metadata
>>>>>>>>>>>>>>>>>>>> or client is running on low resources or the table is 
>>>>>>>>>>>>>>>>>>>> protected. Server MAY
>>>>>>>>>>>>>>>>>>>> choose whatever way required to enforce the client cant 
>>>>>>>>>>>>>>>>>>>> bypass this for
>>>>>>>>>>>>>>>>>>>> example let's say don't vend cred as part of loadTable and 
>>>>>>>>>>>>>>>>>>>> only mint it
>>>>>>>>>>>>>>>>>>>> part of planning completion this would mean if the client 
>>>>>>>>>>>>>>>>>>>> doesn't call plan
>>>>>>>>>>>>>>>>>>>> table .
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I proactively have created a pull request [1], would
>>>>>>>>>>>>>>>>>>>> love to know all your feedback either here or in the PR 
>>>>>>>>>>>>>>>>>>>> directly !
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Wish you all a very happy Holidays, it has been great
>>>>>>>>>>>>>>>>>>>> working with you all.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/14867
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] REST: Scan Planning mode

Reply via email to