Hi Yufei, In the proposed NoSQL persistence [1189] pagination consistency across requests will be guaranteed since the data model tracks changes across the whole catalog. Specifically, a pointer to the "state" of the catalog data can be included into the pagination token.
As for the JDBC persistence, I'm not sure even single request full entity lists are completely free from the side effects of concurrent transactions. I suppose that depends on the transaction isolation level at the database, which is not strictly controlled by Polaris ATM. [1189] https://github.com/apache/polaris/pull/1189 Cheers, Dmitri. On Thu, Oct 23, 2025 at 5:38 PM Yufei Gu <[email protected]> wrote: > +1 on introducing pagination to avoid unbounded collections, that’s > definitely the right direction. > > That said, I’d be cautious about completely removing non-paginated > behavior. There are valid scenarios that rely on retrieving a consistent, > point-in-time view of data. Pagination across a live dataset can introduce > drift between pages unless the server supports some form of snapshot > pinning (e.g., a read timestamp, revision ID, or snapshot token). > > It might be worth discussing how we can support these point-in-time > correctness requirements alongside pagination. > > Yufei > > > On Thu, Oct 23, 2025 at 10:54 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi All, > > > > Supporting the "old" (full list, non-paginated) behaviour with a feature > > flag sounds reasonable to me. > > > > I believe the default should still be "off" (i.e. all requests are > > paginated). Affected deployments will be able to set the flag proactively > > before upgrading to maintain compatibility. > > > > Cheers, > > Dmitri. > > > > > > On Thu, Oct 23, 2025 at 12:45 PM Andrew Guterman < > > [email protected]> > > wrote: > > > > > I understand the risks of non-paginated APIs. > > > > > > The problem is that coupling deprecation of features to release > versions > > > means that downstream projects are blocked from upgrading Polaris > unless > > > they perform every migration for every breaking change. Feature flags > > allow > > > downstream projects to pick their own upgrade path. > > > > > > Best, > > > Andrew > > > > > > On Thu, Oct 23, 2025 at 4:22 AM Alexandre Dutra <[email protected]> > > wrote: > > > > > > > +1 to Robert's proposal of deprecating for removal all non-paginated > > > > requests to Polaris's own APIs. > > > > > > > > For IRC APIs, I'll note that the ObjectMapper that we use already has > > > > a stream read length protection, see > > > > PolarisIcebergObjectMapperCustomizer [1]. We could add a stream write > > > > length protection as well. > > > > > > > > Thanks, > > > > Alex > > > > > > > > [1]: > > > > > > > > > > https://github.com/apache/polaris/blob/20febdaede19fb7c46e120652fdd1a262c2138e4/runtime/service/src/main/java/org/apache/polaris/service/config/PolarisIcebergObjectMapperCustomizer.java#L61-L64 > > > > > > > > On Thu, Oct 23, 2025 at 12:31 PM Robert Stupp <[email protected]> > wrote: > > > > > > > > > > Returning full lists, which can be extremely large, can let > requests > > > > > fail on the client or the server, cause overly excessive resource > > > > > usage or even bring down clients and servers (OOM). That's why most > > > > > listing endpoints have limits on the response size (# of bytes or > > > > > elements) and support paging as a 1st class citizen. I think this > is > > > > > what Polaris should do as well. > > > > > > > > > > Considering the risks that come with large responses, I think > having > > > > > paging always enabled is the safer approach. > > > > > I propose to deprecate the ability to return "full response lists" > at > > > > > least for Polaris' own APIs and require pagination after 1 or 2 > minor > > > > > releases. > > > > > > > > > > For IRC, if we agree that overly large responses are a risk, we can > > > > > let requests that would yield too large responses (w/o pagination) > > > > > fail early and protect both the server and the client. > > > > > > > > > > On Mon, Oct 20, 2025 at 7:18 PM Andrew Guterman > > > > > <[email protected]> wrote: > > > > > > > > > > > > Returning the full list when no pageToken is specified would be > > > > necessary > > > > > > for backward compatibility, but a feature flag as you mentioned > > above > > > > makes > > > > > > sense to me. > > > > > > > > > > > > Best, > > > > > > Andrew > > > > > > > > > > > > On Wed, Oct 15, 2025 at 7:08 AM Dmitri Bourlatchkov < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > > > I agree with your points. > > > > > > > > > > > > > > What worries me in the Iceberg spec is this statement: > > > > > > > > > > > > > > "Clients may initiate the first paginated request by sending an > > > > empty query > > > > > > > parameter `pageToken` to the server." > > > > > > > > > > > > > > I think it implies that a client that does not send a pageToken > > > > parameter > > > > > > > can expect to get the full response (not paginated). > > > > > > > > > > > > > > This is probably not the right forum to discuss the Iceberg > spec, > > > > but I'd > > > > > > > like to avoid this kind of ambiguity in APIs owned by Polaris. > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Tue, Oct 14, 2025 at 7:45 PM Eric Maynard < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hey Dmitri, > > > > > > > > > > > > > > > > This actually matches my interpretation of the IRC spec. It > > says > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/iceberg/blob/c7df5200df462764ba0b3e81484243532c941caf/open-api/rest-catalog-open-api.yaml#L2024 > > > > > > > > > > > > > > > > > : > > > > > > > > > > > > > > > > > Servers that support pagination should identify the > > `pageToken` > > > > > > > parameter > > > > > > > > and return a `next-page-token` in the response if there are > > more > > > > results > > > > > > > > available. > > > > > > > > > > > > > > > > My interpretation of the above is that next-page-token > uniquely > > > > describes > > > > > > > > whether or not more results are available. Not the size of > the > > > > response. > > > > > > > In > > > > > > > > fact, the spec defines page-size as "an *upper bound* of the > > > > number of > > > > > > > > results that a client will receive". Tangentially, I would > > prefer > > > > if the > > > > > > > > spec described this in looser terms, such as a "requested > upper > > > > bound". > > > > > > > > > > > > > > > > What the spec does *not* say is that a client can safely > assume > > > > there are > > > > > > > > no more results if it receives less than page-size elements. > I > > > > think that > > > > > > > > you are probably right that a client exists which makes an > > > > incorrect > > > > > > > > assumption here though :) > > > > > > > > > > > > > > > > --EM > > > > > > > > > > > > > > > > On Tue, Oct 14, 2025 at 4:37 PM Dmitri Bourlatchkov < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Andrew and everyone, > > > > > > > > > > > > > > > > > > Adding pagination to the Management API would be very > > helpful. > > > > > > > > > > > > > > > > > > As to reusing the pagination parameter sepantics of the > > Iceberg > > > > REST > > > > > > > > > spec... I'm not so sure. > > > > > > > > > > > > > > > > > > I do believe that servers should have ultimate control over > > > page > > > > sizes. > > > > > > > > So > > > > > > > > > any client-side "size" parameters should be suggestions or > > > hints > > > > at > > > > > > > most. > > > > > > > > > > > > > > > > > > As a continuation of that approach, the server should > always > > be > > > > able to > > > > > > > > > produce a partial response (with a next page token) even if > > the > > > > client > > > > > > > > did > > > > > > > > > not provide any explicit pagination parameters. > > > > > > > > > > > > > > > > > > That said, given that existing clients may expect to get > > "full" > > > > results > > > > > > > > > from the Management API when they do not use pagination > > > > parameters, I > > > > > > > > think > > > > > > > > > it should be fine to enable that behaviour with a feature > > flag. > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > On Fri, Oct 10, 2025 at 8:08 PM Andrew Guterman < > > > > > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hey folks, > > > > > > > > > > > > > > > > > > > > I wanted to gauge sentiment on adding pagination to > non-IRC > > > > APIs, > > > > > > > such > > > > > > > > as > > > > > > > > > > the management APIs, as the number of management entities > > > > (catalogs, > > > > > > > > > > principals, etc) can grow large and become un-listable > all > > at > > > > once. > > > > > > > > > > > > > > > > > > > > I'm not sure if this has been discussed previously but I > > > > couldn't > > > > > > > find > > > > > > > > a > > > > > > > > > > thread nor PRs related to it. > > > > > > > > > > > > > > > > > > > > My proposal is to not reinvent the wheel and just re-use > > the > > > > spec and > > > > > > > > > > implementation of the IRC APIs, where requests contain a > > > > "page-token" > > > > > > > > and > > > > > > > > > > "page-size" param, and responses return a > > "next-page-token". > > > > > > > > > > > > > > > > > > > > Let me know what you think. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
