asf-tooling commented on issue #1062:
URL:
https://github.com/apache/tooling-trusted-releases/issues/1062#issuecomment-4409859432
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `new_feature` • **Classification:** `actionable` •
**Confidence:** `high`
**Application domain(s):** `web_api_infrastructure`, `admin_operations`,
`shared_infrastructure`
### Summary
Multiple API endpoints return unbounded result sets without pagination. The
issue is confirmed by a TODO comment in checks_list() and by the absence of
limit/offset on several endpoints. The codebase already has an established
pagination pattern (ReleasesListQuery, SshKeysListQuery, TasksListQuery
dataclasses with limit/offset) and a pagination_args_validate utility, so this
is a matter of applying the existing pattern to the affected endpoints.
### Where this lives in the code today
#### `atr/api/__init__.py` — `checks_list` (lines 67-98)
_needs modification_
This endpoint explicitly acknowledges the need for pagination via the TODO
comment and returns all check results unbounded.
```python
@api.typed
@quart_schema.validate_response(models.api.ChecksListResults, 200)
async def checks_list(
_checks_list: Literal["checks/list"],
project_key: safe.ProjectKey,
version_key: safe.VersionKey,
) -> DictResponse:
"""
URL: GET /checks/list/<project_key>/<version_key>
List checks by project and version.
Checks are only conducted during the compose a draft phase. This endpoint
only returns the checks for the most recent draft revision. Once a
release
has been promoted to the vote phase or beyond, the checks returned are
still those for the compose phase.
Warning: the check results include results for archive members, so there
may potentially be thousands or results or more.
"""
# TODO: We should perhaps paginate this
async with db.session() as data:
release_key = sql.release_key(str(project_key), str(version_key))
release = await
data.release(key=release_key).demand(exceptions.NotFound(f"Release
{release_key} not found"))
check_results = await interaction.checks_for(release,
caller_data=data)
return models.api.ChecksListResults(
endpoint="/checks/list",
checks=check_results,
checks_revision=release.unwrap_revision_number,
current_phase=release.phase,
).model_dump(mode="json"), 200
```
#### `atr/models/validation.py` — `pagination_args_validate` (lines 70-86)
_currently does this_
Pagination validation utility already exists and enforces max limit of 1000
and max offset of 1000000.
```python
def pagination_args_validate(query_args: Any) -> None:
# Users could request any amount using limit=N with arbitrarily high N
# We therefore limit the maximum limit to 1000
if hasattr(query_args, "limit"):
limit = query_args.limit
if limit > 1000:
raise ValueError("Maximum limit of 1000 exceeded")
elif limit < 1:
raise ValueError("Minimum limit less than 1 is nonsense")
# Users could request any amount using offset=N with arbitrarily high N
# We therefore limit the maximum offset to 1000000
if hasattr(query_args, "offset"):
offset = query_args.offset
if offset > 1000000:
raise ValueError("Maximum offset of 1000000 exceeded")
elif offset < 0:
raise ValueError("Minimum offset less than 0 is nonsense")
```
#### `atr/models/api.py` — `ReleasesListQuery` (lines 550-554)
_currently does this_
Established pattern for pagination query parameters using a dataclass parsed
from query string.
```python
@dataclasses.dataclass
class ReleasesListQuery:
offset: int = 0
limit: int = 20
phase: str | None = None
```
#### `atr/models/api.py` — `ReleasesListResults` (lines 557-560)
_currently does this_
Established pattern for paginated API response including total count.
```python
class ReleasesListResults(schema.Strict):
endpoint: Literal["/releases/list"] = schema.alias("endpoint")
data: Sequence[sql.Release]
count: int
```
#### `atr/models/api.py` — `ChecksListResults` (lines 36-45)
_needs modification_
Response model needs additional fields for pagination metadata (count,
limit, offset).
```python
class ChecksListResults(schema.Strict):
endpoint: Literal["/checks/list"] = schema.alias("endpoint")
checks: Sequence[sql.CheckResult]
checks_revision: str = schema.example("00005")
current_phase: sql.ReleasePhase =
schema.example(sql.ReleasePhase.RELEASE_CANDIDATE)
@pydantic.field_validator("current_phase", mode="before")
@classmethod
def current_phase_to_enum(cls, v):
return sql.ReleasePhase(v) if isinstance(v, str) else v
```
#### `atr/admin/__init__.py` — `consistency` (lines 145-173)
_needs modification_
Admin consistency endpoint loads all releases and walks the entire
filesystem without limits.
```python
@admin.typed
async def consistency(_session: web.Committer, _consistency:
Literal["consistency"]) -> web.TextResponse:
"""
URL: GET /consistency
Check for consistency between the database and the filesystem.
"""
# Get all releases from the database
async with db.session() as data:
releases = await data.release().all()
database_dirs = []
for release in releases:
path = paths.release_directory_version(release)
database_dirs.append(str(path))
if len(set(database_dirs)) != len(database_dirs):
raise base.ASFQuartException("Duplicate release directories in
database", errorcode=500)
# Get all releases from the filesystem
filesystem_dirs = await _get_filesystem_dirs()
# Pair them up where possible
paired_dirs = []
for database_dir in database_dirs[:]:
for filesystem_dir in filesystem_dirs[:]:
if database_dir == filesystem_dir:
paired_dirs.append(database_dir)
database_dirs.remove(database_dir)
filesystem_dirs.remove(filesystem_dir)
break
```
#### `atr/blueprints/common.py` — `build_api_path` (lines 122-136)
_currently does this_
The API path builder already supports dataclass parameters parsed from query
strings, which is how pagination query params are passed.
```python
def build_api_path(
func: Callable[..., Any],
) -> tuple[
str,
list[tuple[str, type]],
dict[str, str],
tuple[str, type[pydantic.BaseModel]] | None,
tuple[str, type] | None,
tuple[str, type] | None,
list[str],
]:
"""Inspect a function's type hints to build a URL path for an API route.
Accepts URL path params for data, Literal strings for plain URL text,
dataclasses for GET query params
and Pydantic model params for POST bodies
```
### Where new code would go
- `atr/models/api.py` — after symbol ReleasesListQuery
Add ChecksListQuery dataclass following the established pattern for
pagination query parameters.
### Proposed approach
Apply the existing pagination pattern (dataclass query params + validation +
count in response) to the checks_list endpoint as the highest-priority fix. The
codebase already has ReleasesListQuery, SshKeysListQuery, and TasksListQuery as
precedents. Add a ChecksListQuery dataclass with limit/offset defaults, call
pagination_args_validate on it, then slice the results from
interaction.checks_for() and return the total count. The ChecksListResults
model gets additional optional fields (count, limit, offset) so existing
clients aren't broken.
For the admin endpoints (_data_browse and consistency), these are lower
priority since they're admin-only and already behind authentication. A simple
hard limit on _data_browse records and a note about consistency being
inherently unbounded (it's a diagnostic tool) would suffice for now.
### Suggested patches
#### `atr/models/api.py`
Add ChecksListQuery dataclass for pagination params and extend
ChecksListResults with pagination metadata.
````diff
--- a/atr/models/api.py
+++ b/atr/models/api.py
@@ -30,11 +30,19 @@ class ResultsTypeError(TypeError):
pass
[email protected]
+class ChecksListQuery:
+ offset: int = 0
+ limit: int = 100
+
+
class ChecksListResults(schema.Strict):
endpoint: Literal["/checks/list"] = schema.alias("endpoint")
checks: Sequence[sql.CheckResult]
checks_revision: str = schema.example("00005")
current_phase: sql.ReleasePhase =
schema.example(sql.ReleasePhase.RELEASE_CANDIDATE)
+ count: int = schema.example(42)
+ limit: int = schema.example(100)
+ offset: int = schema.example(0)
@pydantic.field_validator("current_phase", mode="before")
@classmethod
````
#### `atr/api/__init__.py`
Add pagination support to checks_list using the new ChecksListQuery
dataclass and validation.
````diff
--- a/atr/api/__init__.py
+++ b/atr/api/__init__.py
@@ -63,6 +63,7 @@ async def checks_list(
_checks_list: Literal["checks/list"],
project_key: safe.ProjectKey,
version_key: safe.VersionKey,
+ query: models.api.ChecksListQuery = models.api.ChecksListQuery(),
) -> DictResponse:
"""
URL: GET /checks/list/<project_key>/<version_key>
@@ -76,16 +77,21 @@ async def checks_list(
Warning: the check results include results for archive members, so there
may potentially be thousands or results or more.
"""
- # TODO: We should perhaps paginate this
+ validation.pagination_args_validate(query)
async with db.session() as data:
release_key = sql.release_key(str(project_key), str(version_key))
release = await
data.release(key=release_key).demand(exceptions.NotFound(f"Release
{release_key} not found"))
check_results = await interaction.checks_for(release,
caller_data=data)
+ total_count = len(check_results)
+ paginated_checks = check_results[query.offset : query.offset +
query.limit]
+
return models.api.ChecksListResults(
endpoint="/checks/list",
- checks=check_results,
+ checks=paginated_checks,
checks_revision=release.unwrap_revision_number,
current_phase=release.phase,
+ count=total_count,
+ limit=query.limit,
+ offset=query.offset,
).model_dump(mode="json"), 200
````
#### `atr/api/__init__.py`
Add pagination support to checks_list_revision using the same pattern.
````diff
--- a/atr/api/__init__.py
+++ b/atr/api/__init__.py
@@ -90,6 +90,7 @@ async def checks_list_revision(
project_key: safe.ProjectKey,
version_key: safe.VersionKey,
revision: safe.RevisionNumber,
+ query: models.api.ChecksListQuery = models.api.ChecksListQuery(),
) -> DictResponse:
"""
URL: GET /checks/list/<project_key>/<version_key>/<revision>
@@ -102,6 +103,7 @@ async def checks_list_revision(
Warning: the check results include results for archive members, so there
may potentially be thousands or results or more.
"""
+ validation.pagination_args_validate(query)
async with db.session() as data:
release_key = sql.release_key(str(project_key), str(version_key))
release_result = await data.release(key=release_key).demand(
@@ -114,9 +116,14 @@ async def checks_list_revision(
check_results = await interaction.checks_for(release_result,
revision=revision, caller_data=data)
+ total_count = len(check_results)
+ paginated_checks = check_results[query.offset : query.offset +
query.limit]
+
return models.api.ChecksListResults(
endpoint="/checks/list",
- checks=check_results,
+ checks=paginated_checks,
checks_revision=str(revision),
current_phase=release_result.phase,
+ count=total_count,
+ limit=query.limit,
+ offset=query.offset,
).model_dump(mode="json"), 200
````
### Open questions
- The interaction.checks_for() function (in atr/db/interaction.py, not
provided) fetches all results in memory before slicing. For truly large result
sets, database-level LIMIT/OFFSET would be more efficient. Need to check if
interaction.checks_for can accept limit/offset parameters.
- The release_paths() endpoint is referenced in the issue but the source
wasn't provided in the truncated file. Need to see its implementation to add
pagination.
- The _data_browse() admin function implementation was not visible in the
truncated source. Its pagination would follow the same pattern but needs the
actual code.
- Adding count/limit/offset to ChecksListResults is additive but changes the
response schema. Need to confirm this won't break existing API consumers
expecting exactly the old shape.
- The consistency admin endpoint is inherently a diagnostic tool that needs
to see all data. A hard cap there may hide real inconsistencies. Perhaps a
streaming response or summary-only mode would be more appropriate.
### Files examined
- `atr/api/__init__.py`
- `atr/admin/__init__.py`
- `atr/db/__init__.py`
- `atr/models/validation.py`
- `atr/models/api.py`
- `atr/blueprints/api.py`
- `atr/blueprints/common.py`
- `atr/models/sql.py`
### Related issues
This issue appears related to: #1072.
_Both address unbounded response sizes and pagination issues on API
endpoints_
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]