asf-tooling commented on issue #470:
URL:
https://github.com/apache/tooling-trusted-releases/issues/470#issuecomment-4410245868
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `new_feature` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `project_committee_management`,
`shared_infrastructure`
### Summary
Issue #470 requests comprehensive validation coverage for every field in the
Committee and Project SQL models. The codebase already has a validation
framework in `atr/validate.py` with validators for several Committee fields
(child_committees, name) and Project fields (category, committee_key, created,
name, programming_languages, release_policy_id). However, several fields lack
explicit validators: Committee's `key`, `is_podling`, `committee_members`,
`committers`, `release_managers`, `parent_committee_key`; and Project's `key`,
`status`, `description`, `version_method`, `version_pattern`, `cycle_match`,
`branch_template`, `created_by`. The documentation in
`atr/docs/input-validation.md` already partially documents existing validation
but would need updating to comprehensively cover each field.
### Where this lives in the code today
#### `atr/validate.py` — `committee` (lines 92-96)
_needs modification_
This is the entry point for committee validation; it currently only checks
child_committees and name, missing key, is_podling, members, committers,
release_managers, parent_committee_key.
```python
def committee(c: sql.Committee) -> AnnotatedDivergences:
"""Check that a committee is valid."""
yield from committee_child_committees(c)
yield from committee_full_name(c)
```
#### `atr/validate.py` — `project` (lines 187-195)
_needs modification_
Entry point for project validation; currently validates 6 fields but is
missing validators for key format, status, description, version_method,
version_pattern, cycle_match, branch_template, and created_by.
```python
def project(p: sql.Project) -> AnnotatedDivergences:
"""Check that a project is valid."""
yield from project_category(p)
yield from project_committee(p)
yield from project_created(p)
yield from project_full_name(p)
yield from project_programming_languages(p)
yield from project_release_policy(p)
```
#### `atr/validate.py` — `committee_components` (lines 55-71)
_extension point_
Decorator used to annotate committee divergence validators; new validators
should use this pattern.
```python
def committee_components(
*components: str,
) -> Callable[[CommitteeDivergences], CommitteeAnnotatedDivergences]:
"""Wrap a Committee divergence generator to yield annotated
divergences."""
def wrap(original: CommitteeDivergences) ->
CommitteeAnnotatedDivergences:
def replacement(c: sql.Committee) -> AnnotatedDivergences:
yield from divergences_with_annotations(
components,
original.__name__,
c.key,
original(c),
)
return replacement
return wrap
```
#### `atr/validate.py` — `project_components` (lines 73-89)
_extension point_
Decorator used to annotate project divergence validators; new validators
should use this pattern.
```python
def project_components(
*components: str,
) -> Callable[[ProjectDivergences], ProjectAnnotatedDivergences]:
"""Wrap a Project divergence generator to yield annotated
divergences."""
def wrap(original: ProjectDivergences) ->
ProjectAnnotatedDivergences:
def replacement(p: sql.Project) -> AnnotatedDivergences:
yield from divergences_with_annotations(
components,
original.__name__,
str(p.key),
original(p),
)
return replacement
return wrap
```
#### `atr/models/sql.py` — `Committee` (lines 659-694)
_currently does this_
The Committee SQL model definition showing all fields that need validation
coverage.
```python
class Committee(sqlmodel.SQLModel, table=True):
key: str = sqlmodel.Field(unique=True, primary_key=True,
**example("example"))
name: str | None = sqlmodel.Field(default=None, **example("Example"))
# True only if this is an incubator podling with a PPMC
is_podling: bool = sqlmodel.Field(default=False)
# 1-M: Committee -> [Committee]
# M-1: Committee -> Committee
child_committees: list["Committee"] = sqlmodel.Relationship(
sa_relationship_kwargs=dict(
backref=orm.backref("parent_committee",
remote_side="Committee.key"),
),
)
# M-1: Committee -> Committee
# 1-M: Committee -> [Committee]
parent_committee_key: str | None = sqlmodel.Field(default=None,
foreign_key="committee.key")
# parent_committee: Optional["Committee"]
# 1-M: Committee -> [Project]
# M-1: Project -> Committee
projects: list["Project"] =
sqlmodel.Relationship(back_populates="committee")
committee_members: list[str] = sqlmodel.Field(
default_factory=list,
sa_column=sqlalchemy.Column(sqlalchemy.JSON, nullable=False),
**example(["sbp", "arm", "wave"]),
)
committers: list[str] = sqlmodel.Field(
default_factory=list,
sa_column=sqlalchemy.Column(sqlalchemy.JSON, nullable=False),
**example(["sbp", "arm", "wave"]),
)
release_managers: list[str] = sqlmodel.Field(
default_factory=list, sa_column=sqlalchemy.Column(sqlalchemy.JSON,
nullable=False), **example(["wave"])
)
```
#### `atr/models/sql.py` — `Project` (lines 714-764)
_currently does this_
The Project SQL model definition showing all fields that need validation
coverage.
```python
class Project(sqlmodel.SQLModel, table=True):
key: str = sqlmodel.Field(primary_key=True, unique=True,
**example("example"))
# TODO: Ideally full_name would be unique for str only, but that's
complex
# We always include "Apache" in the full_name
name: str | None = sqlmodel.Field(default=None, **example("Apache
Example"))
status: ProjectStatus = sqlmodel.Field(default=ProjectStatus.ACTIVE,
**example(ProjectStatus.ACTIVE))
# M-1: Project -> Project
# 1-M: (Project.child_project is missing, would be Project -> [Project])
super_project_key: str | None = sqlmodel.Field(default=None,
foreign_key="project.key")
# NOTE: Neither "Project" | None nor "Project | None" works
super_project: Optional["Project"] = sqlmodel.Relationship()
description: str | None = sqlmodel.Field(default=None,
**example("Example is a simple example project"))
category: str | None = sqlmodel.Field(default=None,
**example("data,storage"))
programming_languages: str | None = sqlmodel.Field(default=None,
**example("c,python"))
# Version-scheme metadata (#912). For "simple" projects the pattern
fields are null
# and the project keeps a single "default" cycle.
version_method: VersionMethod =
sqlmodel.Field(default=VersionMethod.SIMPLE, **example(VersionMethod.SIMPLE))
version_pattern: str | None = sqlmodel.Field(default=None,
**example(r"^\d+\.\d+\.\d+$"))
cycle_match: str | None = sqlmodel.Field(default=None,
**example(r"^(\d+)\.\d+\.\d+$"))
branch_template: str | None = sqlmodel.Field(default=None,
**example("release-{cycle}"))
# M-1: Project -> Committee
# 1-M: Committee -> [Project]
committee_key: str | None = sqlmodel.Field(default=None,
foreign_key="committee.key", **example("example"))
committee: Committee | None =
sqlmodel.Relationship(back_populates="projects")
# 1-M: Project -> [Release]
releases: list["Release"] =
sqlmodel.Relationship(back_populates="project")
# 1-M: Project -C-> [ProjectCycle]
cycles: list["ProjectCycle"] = sqlmodel.Relationship(
back_populates="project",
cascade_delete=True,
sa_relationship_kwargs={"cascade": "all, delete-orphan"},
)
release_policy_id: int | None = sqlmodel.Field(default=None,
foreign_key="releasepolicy.id", ondelete="CASCADE")
release_policy: Optional["ReleasePolicy"] = sqlmodel.Relationship(
cascade_delete=True, sa_relationship_kwargs={"cascade": "all,
delete-orphan", "single_parent": True}
)
created: datetime.datetime = sqlmodel.Field(
default_factory=lambda: datetime.datetime.now(datetime.UTC),
sa_column=sqlalchemy.Column(UTCDateTime, nullable=False),
**example(datetime.datetime(2025, 5, 1, 1, 2, 3,
tzinfo=datetime.UTC)),
)
created_by: str | None = sqlmodel.Field(default=None, **example("user"))
```
#### `atr/shared/projects.py` — `AddProjectForm` (lines 42-53)
_currently does this_
Input-time validation for project creation; validates display_name format
and label format but doesn't cover all Project fields.
```python
class AddProjectForm(form.Form):
committee_key: safe.CommitteeKey = form.label("Committee name",
widget=form.Widget.HIDDEN)
display_name: str = form.label(
"Display name",
'For example, "Apache Example" or "Apache Example Components". '
'You must start with "Apache " and you must use title case.',
)
label: str = form.label(
"Label",
'For example, "example" or "example-components". '
"You must start with your committee label, and you must use lower
case.",
)
```
### Where new code would go
- `atr/validate.py` — after symbol committee_full_name
Add new committee field validators for key format, committee_members,
committers, release_managers.
- `atr/validate.py` — after symbol project_release_policy
Add new project field validators for key format, status, description
length, version_pattern regex validity, created_by format.
### Proposed approach
The approach should be twofold: (1) Add missing validators in
`atr/validate.py` for Committee and Project fields that currently lack data
integrity checks, and (2) update the documentation in
`atr/docs/input-validation.md` to comprehensively list what validation exists
for each field.
For Committee, add validators for: `key` (must be lowercase alphanumeric
with hyphens), `committee_members`/`committers`/`release_managers` (each entry
must match ASF UID pattern), and `release_managers` subset of
`committee_members`. For Project, add validators for: `key` (must be lowercase
alphanumeric with hyphens, matching committee prefix), `status` (must be a
valid enum value), `version_pattern`/`cycle_match` (if set, must be valid
regex), `branch_template` (if set, must contain `{cycle}` placeholder), and
`created_by` (if set, must match ASF UID pattern). Input-time validation is
already handled by `safe.CommitteeKey`/`safe.ProjectKey` types and the
`AddProjectForm` model validator, so the gap is primarily in the data-integrity
layer.
### Suggested patches
#### `atr/validate.py`
Add missing committee field validators for key format, member lists, and
release_managers subset check.
````diff
--- a/atr/validate.py
+++ b/atr/validate.py
@@ -22,6 +22,7 @@
import asyncio
import datetime
+import re
from collections.abc import AsyncGenerator, Callable, Generator, Iterable,
Sequence
from typing import NamedTuple, TypeVar
@@ -84,9 +85,14 @@
def committee(c: sql.Committee) -> AnnotatedDivergences:
"""Check that a committee is valid."""
yield from committee_child_committees(c)
yield from committee_full_name(c)
+ yield from committee_key_format(c)
+ yield from committee_members_format(c)
+ yield from committee_committers_format(c)
+ yield from committee_release_managers_format(c)
+ yield from committee_release_managers_subset(c)
@committee_components("Committee.child_committees")
@@ -120,6 +126,60 @@
)
+_ASF_UID_PATTERN = re.compile(r"^[-_a-z0-9]+$")
+
+
+@committee_components("Committee.key")
+def committee_key_format(c: sql.Committee) -> Divergences:
+ """Check that the committee key uses lowercase alphanumeric with
hyphens."""
+
+ def okay(key: str) -> bool:
+ return bool(re.match(r"^[a-z][a-z0-9-]*$", key))
+
+ yield from divergences_predicate(okay, "lowercase alphanumeric key
starting with a letter", c.key)
+
+
+@committee_components("Committee.committee_members")
+def committee_members_format(c: sql.Committee) -> Divergences:
+ """Check that all committee_members entries are valid ASF UIDs."""
+
+ for uid in c.committee_members:
+ if not _ASF_UID_PATTERN.match(uid):
+ yield Divergence(f"valid ASF UID for entry", repr(uid))
+
+
+@committee_components("Committee.committers")
+def committee_committers_format(c: sql.Committee) -> Divergences:
+ """Check that all committers entries are valid ASF UIDs."""
+
+ for uid in c.committers:
+ if not _ASF_UID_PATTERN.match(uid):
+ yield Divergence(f"valid ASF UID for entry", repr(uid))
+
+
+@committee_components("Committee.release_managers")
+def committee_release_managers_format(c: sql.Committee) -> Divergences:
+ """Check that all release_managers entries are valid ASF UIDs."""
+
+ for uid in c.release_managers:
+ if not _ASF_UID_PATTERN.match(uid):
+ yield Divergence(f"valid ASF UID for entry", repr(uid))
+
+
+@committee_components("Committee.release_managers",
"Committee.committee_members")
+def committee_release_managers_subset(c: sql.Committee) -> Divergences:
+ """Check that all release_managers are also committee_members."""
+
+ members_set = set(c.committee_members)
+ for uid in c.release_managers:
+ if uid not in members_set:
+ yield Divergence(
+ f"release_manager '{uid}' to be in committee_members",
+ f"'{uid}' not in committee_members",
+ )
+
+
def committees(cs: Iterable[sql.Committee]) -> AnnotatedDivergences:
"""Validate multiple committees."""
for c in cs:
````
#### `atr/validate.py`
Add missing project field validators for key format, version_pattern regex
validity, and created_by UID format.
````diff
--- a/atr/validate.py
+++ b/atr/validate.py
@@ -153,12 +207,15 @@
def project(p: sql.Project) -> AnnotatedDivergences:
"""Check that a project is valid."""
yield from project_category(p)
yield from project_committee(p)
yield from project_created(p)
+ yield from project_created_by(p)
yield from project_full_name(p)
+ yield from project_key_format(p)
yield from project_programming_languages(p)
yield from project_release_policy(p)
+ yield from project_version_pattern(p)
@project_components("Project.category")
@@ -200,6 +257,42 @@
yield from divergences_predicate(okay, expected, p.name)
+@project_components("Project.key")
+def project_key_format(p: sql.Project) -> Divergences:
+ """Check that the project key uses lowercase alphanumeric with
hyphens."""
+
+ def okay(key: str) -> bool:
+ return bool(re.match(r"^[a-z][a-z0-9-]*$", key))
+
+ yield from divergences_predicate(okay, "lowercase alphanumeric key
starting with a letter", p.key)
+
+
+@project_components("Project.created_by")
+def project_created_by(p: sql.Project) -> Divergences:
+ """Check that created_by is a valid ASF UID if set."""
+
+ def okay(uid: str | None) -> bool:
+ if uid is None:
+ return True
+ return bool(_ASF_UID_PATTERN.match(uid))
+
+ yield from divergences_predicate(okay, "valid ASF UID or None",
p.created_by)
+
+
+@project_components("Project.version_pattern")
+def project_version_pattern(p: sql.Project) -> Divergences:
+ """Check that version_pattern is a valid regex if set."""
+
+ def okay(pattern: str | None) -> bool:
+ if pattern is None:
+ return True
+ try:
+ re.compile(pattern)
+ return True
+ except re.error:
+ return False
+
+ yield from divergences_predicate(okay, "valid regex pattern or None",
p.version_pattern)
+
+
@project_components("Project.programming_languages")
def project_programming_languages(p: sql.Project) -> Divergences:
````
### Open questions
- Should `Committee.is_podling` be validated (e.g., checking that only
committees under the Incubator parent have is_podling=True)?
- Should `Project.cycle_match` and `Project.branch_template` also have
explicit validators (regex validity and template placeholder presence)?
- The issue mentions 'document and confirm' - does this require a separate
tracking document listing each field and its validation status, or is updating
`atr/docs/input-validation.md` sufficient?
- Should the key format regex be `^[a-z][a-z0-9-]*$` or match the existing
`safe.CommitteeKey`/`safe.ProjectKey` validation patterns exactly? Need to
check what those types enforce.
- Is there input-time validation needed for Committee fields
(committee_members, committers, release_managers) or are those only populated
programmatically from LDAP?
### Files examined
- `atr/models/sql.py`
- `atr/validate.py`
- `atr/form.py`
- `atr/models/schema.py`
- `atr/models/validation.py`
- `atr/shared/projects.py`
- `atr/docs/input-validation.md`
- `atr/post/projects.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]