stark256-spec opened a new pull request, #3454: URL: https://github.com/apache/iceberg-python/pull/3454
## Problem `list_tables`, `list_views`, and `list_namespaces` in the REST catalog eagerly collect every page before returning, even if the caller only needs the first few results. In namespaces with thousands of tables this creates unnecessary network round-trips and latency before the first result is visible. Closes #3365 ## Solution Adds `PaginationList[T]` (`pyiceberg/utils/pagination.py`) — a `list` subclass that pre-loads the first page and lazily fetches subsequent pages only as the caller iterates past items already in memory. ### Design | Operation | Behaviour | |-----------|-----------| | `for item in result` | Lazy — next page fetched only when iterator exhausts current buffer | | `result[0]` / `result[2]` | Lazy — fetches pages until the requested index is available | | `result[-1]` / `result[1:3]` / `len(result)` / `x in result` / `result == other` | Eager — fetches all remaining pages | | `isinstance(result, list)` | `True` — full backward compatibility | ### Key properties - **Zero breaking changes**: `PaginationList` subclasses `list`, so all existing call sites that iterate, compare, or extend the return value continue to work without modification. - **First page always pre-loaded**: Callers that only look at the first few items pay zero extra latency compared to the old implementation. - **Single fetch per page**: Each page token is consumed at most once; no redundant requests. ## Changes - `pyiceberg/utils/pagination.py` — new `PaginationList[T]` class - `pyiceberg/catalog/rest/__init__.py` — `list_tables`, `list_views`, `list_namespaces` refactored to return `PaginationList` - `tests/utils/test_pagination.py` — 14 unit tests for all `PaginationList` operations - `tests/catalog/test_rest.py` — `test_list_tables_returns_pagination_list` verifies lazy behaviour (call count stays at 1 while iterating within the first page, rises to 2 only after crossing the page boundary) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
