asf-tooling commented on issue #465:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/465#issuecomment-4410276297

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `refactor`  •  **Classification:** `actionable`  •  
**Confidence:** `high`
   **Application domain(s):** `project_committee_management`, 
`shared_infrastructure`
   
   ### Summary
   The issue identifies two data model problems in `atr/datasources/apache.py`: 
(1) the `Committee` Pydantic model is missing a `charter` field that exists in 
the upstream Whimsy committee-info.json data, and (2) the `ProjectStatus` model 
has a `charter` field even though only PMCs have charters. @dave2wave confirmed 
this needs refactoring. The fix involves adding `charter` to the `Committee` 
datasource model, adding it to the SQL `Committee` model, storing it during 
metadata updates, and potentially documenting that `charter` in `ProjectStatus` 
exists only for upstream data compatibility.
   
   ### Where this lives in the code today
   
   #### `atr/datasources/apache.py` — `Committee` (lines 83-94)
   _needs modification_
   This Pydantic model for parsing Whimsy committee-info.json is missing the 
`charter` field that PMCs have in the upstream data.
   
   ```python
   class Committee(schema.Strict):
       name: str
       display_name: str
       site: str | None
       description: str | None
       mail_list: str
       established: str | None
       report: list[str]
       chair: Annotated[list[User], helpers.DictToList(key="id")]
       roster_count: int
       roster: Annotated[list[User], helpers.DictToList(key="id")]
       pmc: bool
   ```
   
   #### `atr/datasources/apache.py` — `ProjectStatus` (lines 192-215)
   _needs modification_
   This Pydantic model for projects.apache.org data has a `charter` field, but 
semantically charters belong to PMCs not projects. The field exists for 
upstream data compatibility.
   
   ```python
   class ProjectStatus(schema.Strict):
       category: list[str] = schema.factory(list)
       created: str | None = None
       description: str | None = None
       programming_language: list[str] = 
schema.Field(alias="programming-language", default_factory=list)
       doap: str | None = None
       homepage: str
       name: str
       pmc: str | None
       shortdesc: str | None = None
       repository: list[str | dict] = schema.factory(list)
       release: list[Release] = schema.factory(list)
       bug_database: str | None = schema.alias_opt("bug-database")
       download_page: str | None = schema.alias_opt("download-page")
       license: str | None = None
       mailing_list: str | None = schema.alias_opt("mailing-list")
       maintainer: list[MaintainerInfo] = schema.factory(list)
       implements: list[ImplementsInfo] = schema.factory(list)
       same_as: str | None = schema.alias_opt("sameAs")
       developer: list[MaintainerInfo] = schema.factory(list)
       modified: str | None = None
       chair: ChairInfo | None = None
       charter: str | None = None
       vendor: str | None = None
   ```
   
   #### `atr/datasources/apache.py` — `_update_committees` (lines 374-406)
   _needs modification_
   When committee_info is found, only `display_name` is stored. The charter 
should also be stored here.
   
   ```python
   async def _update_committees(
       data: db.Session, ldap_projects: LDAPProjectsData, committees_by_name: 
Mapping[str, Committee]
   ) -> tuple[int, int]:
       added_count = 0
       updated_count = 0
   
       # First create PMC committees
       for project in ldap_projects.projects:
           name = project.name
           # Skip non-PMC committees
           if project.pmc is not True:
               continue
   
           # Get or create PMC
           committee = await data.committee(key=name).get()
           if not committee:
               committee = sql.Committee(key=name)
               data.add(committee)
               added_count += 1
           else:
               updated_count += 1
   
           committee.committee_members = project.owners
           committee.committers = project.members
           # We create PMCs for now
           committee.is_podling = False
           committee_info = committees_by_name.get(name)
           if committee_info:
               committee.name = committee_info.display_name
   
           updated_count += 1
   
       return added_count, updated_count
   ```
   
   #### `atr/models/sql.py` — `Committee` (lines 659-680)
   _needs modification_
   The SQL Committee model is missing a `charter` field to persist the board 
resolution charter text.
   
   ```python
   class Committee(sqlmodel.SQLModel, table=True):
       key: str = sqlmodel.Field(unique=True, primary_key=True, 
**example("example"))
       name: str | None = sqlmodel.Field(default=None, **example("Example"))
       # True only if this is an incubator podling with a PPMC
       is_podling: bool = sqlmodel.Field(default=False)
   
       # 1-M: Committee -> [Committee]
       # M-1: Committee -> Committee
       child_committees: list["Committee"] = sqlmodel.Relationship(
           sa_relationship_kwargs=dict(
               backref=orm.backref("parent_committee", 
remote_side="Committee.key"),
           ),
       )
   
       # M-1: Committee -> Committee
       # 1-M: Committee -> [Committee]
       parent_committee_key: str | None = sqlmodel.Field(default=None, 
foreign_key="committee.key")
       # parent_committee: Optional["Committee"]
   
       # 1-M: Committee -> [Project]
       # M-1: Project -> Committee
       projects: list["Project"] = 
sqlmodel.Relationship(back_populates="committee")
   ```
   
   ### Where new code would go
   - `atr/datasources/apache.py` — after symbol Committee
     Add `charter` field to the Committee Pydantic model
   - `atr/models/sql.py` — after field `name` in Committee SQL model
     Add `charter` column to persist the committee charter from board resolution
   
   ### Proposed approach
   The refactoring has three parts: (1) Add `charter: str | None = None` to the 
`Committee` Pydantic model in `atr/datasources/apache.py` so that charter data 
from Whimsy's committee-info.json is parsed. (2) Add a `charter` field to the 
SQL `Committee` model and create an Alembic migration. (3) Store the charter in 
`_update_committees()` when `committee_info` is available. The `charter` field 
on `ProjectStatus` should remain (with a comment) because the upstream 
projects.apache.org JSON may include it for data compatibility, but it should 
not be stored on `Project` records. An alternative would be to transfer the 
charter from `ProjectStatus` to the committee during `_update_projects()`, but 
the cleaner approach is to source it from committee-info.json via 
`CommitteeData`.
   
   ### Suggested patches
   
   #### `atr/datasources/apache.py`
   Add charter field to the Committee Pydantic model for parsing Whimsy data
   
   ````diff
   --- a/atr/datasources/apache.py
   +++ b/atr/datasources/apache.py
   @@ -78,6 +78,7 @@ class Committee(schema.Strict):
        name: str
        display_name: str
        site: str | None
   +    charter: str | None = None
        description: str | None
        mail_list: str
        established: str | None
   ````
   
   #### `atr/datasources/apache.py`
   Store charter from committee-info during metadata update, and add clarifying 
comment on ProjectStatus.charter
   
   ````diff
   --- a/atr/datasources/apache.py
   +++ b/atr/datasources/apache.py
   @@ -237,6 +237,7 @@ class ProjectStatus(schema.Strict):
        chair: ChairInfo | None = None
   -    charter: str | None = None
   +    charter: str | None = None  # Present in upstream data but semantically 
belongs to PMC, not project
        vendor: str | None = None
   @@ -365,6 +366,7 @@ async def _update_committees(
            committee_info = committees_by_name.get(name)
            if committee_info:
                committee.name = committee_info.display_name
   +            committee.charter = committee_info.charter
    
            updated_count += 1
   ````
   
   #### `atr/models/sql.py`
   Add charter column to the SQL Committee model to persist the board 
resolution text
   
   ````diff
   --- a/atr/models/sql.py
   +++ b/atr/models/sql.py
   @@ -411,6 +411,7 @@ class Committee(sqlmodel.SQLModel, table=True):
        key: str = sqlmodel.Field(unique=True, primary_key=True, 
**example("example"))
        name: str | None = sqlmodel.Field(default=None, **example("Example"))
   +    charter: str | None = sqlmodel.Field(default=None)
        # True only if this is an incubator podling with a PPMC
        is_podling: bool = sqlmodel.Field(default=False)
   ````
   
   #### `tests/unit/datasources/testdata/committees.json`
   Add charter field to test data to verify parsing
   
   ````diff
   --- a/tests/unit/datasources/testdata/committees.json
   +++ b/tests/unit/datasources/testdata/committees.json
   @@ -5,6 +5,7 @@
      "committees": {
        "tooling": {
          "display_name": "Tooling",
   +      "charter": "The mission of the Apache Tooling project is the creation 
and maintenance of tools for the ASF.",
          "site": "http://tooling.apache.org/";,
          "description": "tools, tools, tools",
          "mail_list": "tooling",
   ````
   
   #### `tests/unit/datasources/test_apache.py`
   Add assertion to verify charter is parsed correctly
   
   ````diff
   --- a/tests/unit/datasources/test_apache.py
   +++ b/tests/unit/datasources/test_apache.py
   @@ -43,6 +43,7 @@ def test_committee_data_model():
        assert tooling.name == "tooling"
        assert len(tooling.roster) == 3
        assert "tn" in map(lambda x: x.id, tooling.roster)
   +    assert tooling.charter == "The mission of the Apache Tooling project is 
the creation and maintenance of tools for the ASF."
    
        assert len(tooling.chair) == 1
        assert "wave" in map(lambda x: x.id, tooling.chair)
   ````
   
   ### Open questions
   - An Alembic migration will be needed to add the `charter` column to the 
`committee` table. The exact migration file is not included here as it depends 
on the existing migration history.
   - Should the `charter` field from `ProjectStatus` be actively 
ignored/dropped, or is it acceptable to keep parsing it for data compatibility 
but simply not persist it?
   - The upstream Whimsy committee-info.json format should be verified to 
confirm the field name is exactly `charter` — the issue mentions 
committee-info.yaml but the code fetches JSON from Whimsy.
   
   ### Files examined
   - `atr/datasources/apache.py`
   - `tests/unit/datasources/test_apache.py`
   - `tests/unit/datasources/testdata/committees.json`
   - `atr/models/sql.py`
   - `atr/db/__init__.py`
   - `atr/storage/__init__.py`
   - `atr/principal.py`
   - `atr/registry.py`
   
   ### Related issues
   This issue appears related to: #468, #462.
   
   _All address data model issues with committee and project metadata schema_
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to