asf-tooling commented on issue #465:
URL:
https://github.com/apache/tooling-trusted-releases/issues/465#issuecomment-4410276297
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `refactor` • **Classification:** `actionable` •
**Confidence:** `high`
**Application domain(s):** `project_committee_management`,
`shared_infrastructure`
### Summary
The issue identifies two data model problems in `atr/datasources/apache.py`:
(1) the `Committee` Pydantic model is missing a `charter` field that exists in
the upstream Whimsy committee-info.json data, and (2) the `ProjectStatus` model
has a `charter` field even though only PMCs have charters. @dave2wave confirmed
this needs refactoring. The fix involves adding `charter` to the `Committee`
datasource model, adding it to the SQL `Committee` model, storing it during
metadata updates, and potentially documenting that `charter` in `ProjectStatus`
exists only for upstream data compatibility.
### Where this lives in the code today
#### `atr/datasources/apache.py` — `Committee` (lines 83-94)
_needs modification_
This Pydantic model for parsing Whimsy committee-info.json is missing the
`charter` field that PMCs have in the upstream data.
```python
class Committee(schema.Strict):
name: str
display_name: str
site: str | None
description: str | None
mail_list: str
established: str | None
report: list[str]
chair: Annotated[list[User], helpers.DictToList(key="id")]
roster_count: int
roster: Annotated[list[User], helpers.DictToList(key="id")]
pmc: bool
```
#### `atr/datasources/apache.py` — `ProjectStatus` (lines 192-215)
_needs modification_
This Pydantic model for projects.apache.org data has a `charter` field, but
semantically charters belong to PMCs not projects. The field exists for
upstream data compatibility.
```python
class ProjectStatus(schema.Strict):
category: list[str] = schema.factory(list)
created: str | None = None
description: str | None = None
programming_language: list[str] =
schema.Field(alias="programming-language", default_factory=list)
doap: str | None = None
homepage: str
name: str
pmc: str | None
shortdesc: str | None = None
repository: list[str | dict] = schema.factory(list)
release: list[Release] = schema.factory(list)
bug_database: str | None = schema.alias_opt("bug-database")
download_page: str | None = schema.alias_opt("download-page")
license: str | None = None
mailing_list: str | None = schema.alias_opt("mailing-list")
maintainer: list[MaintainerInfo] = schema.factory(list)
implements: list[ImplementsInfo] = schema.factory(list)
same_as: str | None = schema.alias_opt("sameAs")
developer: list[MaintainerInfo] = schema.factory(list)
modified: str | None = None
chair: ChairInfo | None = None
charter: str | None = None
vendor: str | None = None
```
#### `atr/datasources/apache.py` — `_update_committees` (lines 374-406)
_needs modification_
When committee_info is found, only `display_name` is stored. The charter
should also be stored here.
```python
async def _update_committees(
data: db.Session, ldap_projects: LDAPProjectsData, committees_by_name:
Mapping[str, Committee]
) -> tuple[int, int]:
added_count = 0
updated_count = 0
# First create PMC committees
for project in ldap_projects.projects:
name = project.name
# Skip non-PMC committees
if project.pmc is not True:
continue
# Get or create PMC
committee = await data.committee(key=name).get()
if not committee:
committee = sql.Committee(key=name)
data.add(committee)
added_count += 1
else:
updated_count += 1
committee.committee_members = project.owners
committee.committers = project.members
# We create PMCs for now
committee.is_podling = False
committee_info = committees_by_name.get(name)
if committee_info:
committee.name = committee_info.display_name
updated_count += 1
return added_count, updated_count
```
#### `atr/models/sql.py` — `Committee` (lines 659-680)
_needs modification_
The SQL Committee model is missing a `charter` field to persist the board
resolution charter text.
```python
class Committee(sqlmodel.SQLModel, table=True):
key: str = sqlmodel.Field(unique=True, primary_key=True,
**example("example"))
name: str | None = sqlmodel.Field(default=None, **example("Example"))
# True only if this is an incubator podling with a PPMC
is_podling: bool = sqlmodel.Field(default=False)
# 1-M: Committee -> [Committee]
# M-1: Committee -> Committee
child_committees: list["Committee"] = sqlmodel.Relationship(
sa_relationship_kwargs=dict(
backref=orm.backref("parent_committee",
remote_side="Committee.key"),
),
)
# M-1: Committee -> Committee
# 1-M: Committee -> [Committee]
parent_committee_key: str | None = sqlmodel.Field(default=None,
foreign_key="committee.key")
# parent_committee: Optional["Committee"]
# 1-M: Committee -> [Project]
# M-1: Project -> Committee
projects: list["Project"] =
sqlmodel.Relationship(back_populates="committee")
```
### Where new code would go
- `atr/datasources/apache.py` — after symbol Committee
Add `charter` field to the Committee Pydantic model
- `atr/models/sql.py` — after field `name` in Committee SQL model
Add `charter` column to persist the committee charter from board resolution
### Proposed approach
The refactoring has three parts: (1) Add `charter: str | None = None` to the
`Committee` Pydantic model in `atr/datasources/apache.py` so that charter data
from Whimsy's committee-info.json is parsed. (2) Add a `charter` field to the
SQL `Committee` model and create an Alembic migration. (3) Store the charter in
`_update_committees()` when `committee_info` is available. The `charter` field
on `ProjectStatus` should remain (with a comment) because the upstream
projects.apache.org JSON may include it for data compatibility, but it should
not be stored on `Project` records. An alternative would be to transfer the
charter from `ProjectStatus` to the committee during `_update_projects()`, but
the cleaner approach is to source it from committee-info.json via
`CommitteeData`.
### Suggested patches
#### `atr/datasources/apache.py`
Add charter field to the Committee Pydantic model for parsing Whimsy data
````diff
--- a/atr/datasources/apache.py
+++ b/atr/datasources/apache.py
@@ -78,6 +78,7 @@ class Committee(schema.Strict):
name: str
display_name: str
site: str | None
+ charter: str | None = None
description: str | None
mail_list: str
established: str | None
````
#### `atr/datasources/apache.py`
Store charter from committee-info during metadata update, and add clarifying
comment on ProjectStatus.charter
````diff
--- a/atr/datasources/apache.py
+++ b/atr/datasources/apache.py
@@ -237,6 +237,7 @@ class ProjectStatus(schema.Strict):
chair: ChairInfo | None = None
- charter: str | None = None
+ charter: str | None = None # Present in upstream data but semantically
belongs to PMC, not project
vendor: str | None = None
@@ -365,6 +366,7 @@ async def _update_committees(
committee_info = committees_by_name.get(name)
if committee_info:
committee.name = committee_info.display_name
+ committee.charter = committee_info.charter
updated_count += 1
````
#### `atr/models/sql.py`
Add charter column to the SQL Committee model to persist the board
resolution text
````diff
--- a/atr/models/sql.py
+++ b/atr/models/sql.py
@@ -411,6 +411,7 @@ class Committee(sqlmodel.SQLModel, table=True):
key: str = sqlmodel.Field(unique=True, primary_key=True,
**example("example"))
name: str | None = sqlmodel.Field(default=None, **example("Example"))
+ charter: str | None = sqlmodel.Field(default=None)
# True only if this is an incubator podling with a PPMC
is_podling: bool = sqlmodel.Field(default=False)
````
#### `tests/unit/datasources/testdata/committees.json`
Add charter field to test data to verify parsing
````diff
--- a/tests/unit/datasources/testdata/committees.json
+++ b/tests/unit/datasources/testdata/committees.json
@@ -5,6 +5,7 @@
"committees": {
"tooling": {
"display_name": "Tooling",
+ "charter": "The mission of the Apache Tooling project is the creation
and maintenance of tools for the ASF.",
"site": "http://tooling.apache.org/",
"description": "tools, tools, tools",
"mail_list": "tooling",
````
#### `tests/unit/datasources/test_apache.py`
Add assertion to verify charter is parsed correctly
````diff
--- a/tests/unit/datasources/test_apache.py
+++ b/tests/unit/datasources/test_apache.py
@@ -43,6 +43,7 @@ def test_committee_data_model():
assert tooling.name == "tooling"
assert len(tooling.roster) == 3
assert "tn" in map(lambda x: x.id, tooling.roster)
+ assert tooling.charter == "The mission of the Apache Tooling project is
the creation and maintenance of tools for the ASF."
assert len(tooling.chair) == 1
assert "wave" in map(lambda x: x.id, tooling.chair)
````
### Open questions
- An Alembic migration will be needed to add the `charter` column to the
`committee` table. The exact migration file is not included here as it depends
on the existing migration history.
- Should the `charter` field from `ProjectStatus` be actively
ignored/dropped, or is it acceptable to keep parsing it for data compatibility
but simply not persist it?
- The upstream Whimsy committee-info.json format should be verified to
confirm the field name is exactly `charter` — the issue mentions
committee-info.yaml but the code fetches JSON from Whimsy.
### Files examined
- `atr/datasources/apache.py`
- `tests/unit/datasources/test_apache.py`
- `tests/unit/datasources/testdata/committees.json`
- `atr/models/sql.py`
- `atr/db/__init__.py`
- `atr/storage/__init__.py`
- `atr/principal.py`
- `atr/registry.py`
### Related issues
This issue appears related to: #468, #462.
_All address data model issues with committee and project metadata schema_
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]