asf-tooling commented on issue #911:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/911#issuecomment-4410068681

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `new_feature`  •  **Classification:** `actionable`  •  
**Confidence:** `medium`
   **Application domain(s):** `shared_infrastructure`, `release_lifecycle`, 
`cryptographic_keys`
   
   ### Summary
   Issue #911 requests a new SQLModel table for tracking release artifact 
metadata as part of a 'Release Catalog'. The proposed model links artifacts to 
their GPG signing keys, signature files, checksum files, classification 
results, and SVN revision provenance. The codebase currently has 
`ReleaseFileState` which tracks file-level state per revision within ATR, but 
does not have a dedicated table relating artifacts to their cryptographic 
metadata and distribution provenance. The new model would be defined in 
`atr/models/sql.py` with a corresponding query builder in `atr/db/__init__.py`.
   
   ### Where this lives in the code today
   
   #### `atr/models/sql.py` — `KeyLink` (lines 465-467)
   _extension point_
   Shows how the existing schema links entities to PublicSigningKey via 
fingerprint foreign keys, which the new ArtifactCatalog model would also do.
   
   ```python
   class KeyLink(sqlmodel.SQLModel, table=True):
       committee_key: str = sqlmodel.Field(foreign_key="committee.key", 
primary_key=True)
       key_fingerprint: str = 
sqlmodel.Field(foreign_key="publicsigningkey.fingerprint", primary_key=True)
   ```
   
   #### `atr/db/__init__.py` — `Session.release_file_state` (lines 703-727)
   _extension point_
   Shows the pattern for query builder methods in the Session class; a similar 
`artifact_catalog` method would be added.
   
   ```python
       def release_file_state(
           self,
           release_key: Opt[str] = NOT_SET,
           path: Opt[str] = NOT_SET,
           since_revision_seq: Opt[int] = NOT_SET,
           present: Opt[bool] = NOT_SET,
           content_hash: Opt[str | None] = NOT_SET,
           classification: Opt[str | None] = NOT_SET,
       ) -> Query[sql.ReleaseFileState]:
           query = sqlmodel.select(sql.ReleaseFileState)
   
           if is_defined(release_key):
               query = query.where(sql.ReleaseFileState.release_key == 
release_key)
           if is_defined(path):
               query = query.where(sql.ReleaseFileState.path == path)
           if is_defined(since_revision_seq):
               query = query.where(sql.ReleaseFileState.since_revision_seq == 
since_revision_seq)
           if is_defined(present):
               query = query.where(sql.ReleaseFileState.present == present)
           if is_defined(content_hash):
               query = query.where(sql.ReleaseFileState.content_hash == 
content_hash)
           if is_defined(classification):
               query = query.where(sql.ReleaseFileState.classification == 
classification)
   
           return Query(self, query)
   ```
   
   #### `atr/models/sql.py` — `ReleasePhase` (lines 177-185)
   _currently does this_
   The release phase model shows lifecycle context; artifact catalog entries 
would track artifacts that reach the RELEASE phase (published to dist:release).
   
   ```python
   class ReleasePhase(enum.StrEnum):
       # Step 1: The candidate files are added from external sources and 
checked by ATR
       RELEASE_CANDIDATE_DRAFT = "release_candidate_draft"
       # Step 2: The project members are voting on the candidate release
       RELEASE_CANDIDATE = "release_candidate"
       # Step 3: The release files are being put in place
       RELEASE_PREVIEW = "release_preview"
       # Step 4: The release has been announced
       RELEASE = "release"
   ```
   
   #### `atr/storage/readers/releases.py` — `GeneralPublic.path_info` (lines 
57-80)
   _currently does this_
   Shows how classifications and path metadata are currently read; the new 
artifact catalog could be populated from this data as artifacts are published.
   
   ```python
       async def path_info(self, release: sql.Release, all_paths: 
list[safe.RelPath]) -> types.PathInfo | None:
           info = types.PathInfo()
           latest_revision_number = release.latest_revision_number
           if latest_revision_number is None:
               return None
           await self.__successes_errors_warnings(release, 
release.safe_latest_revision_number, info)
           base_path = paths.release_directory(release)
           revision_seq = int(str(release.safe_latest_revision_number))
           db_classifications = await 
self.__data.release_file_classifications_at(release.key, revision_seq)
           source_matcher, binary_matcher = classify.matchers_from_policy(
               release.project.policy_source_artifact_paths,
               release.project.policy_binary_artifact_paths,
               base_path,
           )
           for path in all_paths:
               db_value = db_classifications.get(str(path))
               if db_value is not None:
                   info.file_types[path] = classify.FileType(db_value)
               else:
                   info.file_types[path] = await classify.classify(
                       path, base_path=base_path, 
source_matcher=source_matcher, binary_matcher=binary_matcher
                   )
           self.__compute_checker_stats(info, all_paths)
           return info
   ```
   
   ### Where new code would go
   - `atr/models/sql.py` — after symbol ReleaseFileState
     The new ArtifactCatalog model should be placed near ReleaseFileState since 
both track file-level metadata for releases.
   - `atr/db/__init__.py` — after symbol Session.release_file_state
     A new query builder method for the ArtifactCatalog model following the 
established pattern.
   
   ### Proposed approach
   Define a new `ArtifactCatalog` SQLModel table in `atr/models/sql.py` with 
the fields specified in the issue: `artifact_path`, `key_fingerprint` (FK to 
PublicSigningKey), `signature_path`, `checksum_path`, `classification`, 
`project_key` (FK to Project), `version_key`, and `svn_revision`. The composite 
natural key is (project_key, version_key, artifact_path) since an artifact is 
uniquely identified within a project version. Add a corresponding query builder 
method in `atr/db/__init__.py` following the session pattern. An Alembic 
migration will also be needed but is not included here since it should be 
auto-generated after the model is finalized.
   
   The note in the issue that 'not all of a release's artifacts were added in 
the same revision' justifies the per-artifact `svn_revision` field rather than 
a release-level one. The `checksum_path` field with 'pick the best' logic 
suggests storing only the strongest available hash file path (SHA-512 > SHA-256 
> MD5).
   
   ### Suggested patches
   
   #### `atr/models/sql.py`
   Add the ArtifactCatalog SQLModel table with the fields described in the 
issue.
   
   ````diff
   --- a/atr/models/sql.py
   +++ b/atr/models/sql.py
   @@ -348,6 +348,34 @@ sqlmodel.SQLModel.metadata = sqlalchemy.MetaData(
    class KeyLink(sqlmodel.SQLModel, table=True):
        committee_key: str = sqlmodel.Field(foreign_key="committee.key", 
primary_key=True)
        key_fingerprint: str = 
sqlmodel.Field(foreign_key="publicsigningkey.fingerprint", primary_key=True)
    
    
   +# ArtifactCatalog:
   +class ArtifactCatalog(sqlmodel.SQLModel, table=True):
   +    """Tracks metadata for release artifacts in the Release Catalog.
   +
   +    Each entry represents a single artifact within a project version.
   +    Historically not all of a release's artifacts were added in the same
   +    SVN revision, so svn_revision is tracked per-artifact.
   +    """
   +
   +    # Composite primary key: an artifact is unique within a project+version
   +    project_key: str = sqlmodel.Field(foreign_key="project.key", 
primary_key=True, index=True)
   +    version_key: str = sqlmodel.Field(primary_key=True, index=True)
   +    artifact_path: str = sqlmodel.Field(primary_key=True, 
**example("apache-example-1.0.0.tar.gz"))
   +
   +    # The GPG public key fingerprint used to sign the artifact
   +    key_fingerprint: str | None = sqlmodel.Field(
   +        default=None, foreign_key="publicsigningkey.fingerprint", index=True
   +    )
   +
   +    # Path to detached signature file (not a URL)
   +    signature_path: str | None = sqlmodel.Field(default=None)
   +
   +    # Path to the best available checksum file (SHA-512 preferred)
   +    checksum_path: str | None = sqlmodel.Field(default=None)
   +
   +    # Result of the classify process (e.g. source, binary, docs)
   +    classification: str | None = sqlmodel.Field(default=None)
   +
   +    # SVN revision where artifact was added to dist:release (via mv or add)
   +    svn_revision: str | None = sqlmodel.Field(default=None, index=True)
   +
    
    # PersonalAccessToken:
    class PersonalAccessToken(sqlmodel.SQLModel, table=True):
   ````
   
   #### `atr/db/__init__.py`
   Add a query builder method for ArtifactCatalog following the established 
Session pattern.
   
   ````diff
   --- a/atr/db/__init__.py
   +++ b/atr/db/__init__.py
   @@ -140,6 +140,33 @@ class Session(sqlalchemy.ext.asyncio.AsyncSession):
        async def begin_immediate(self) -> None:
            await self.execute(sqlalchemy.text("BEGIN IMMEDIATE"))
    
   +    def artifact_catalog(
   +        self,
   +        project_key: Opt[str] = NOT_SET,
   +        version_key: Opt[str] = NOT_SET,
   +        artifact_path: Opt[str] = NOT_SET,
   +        key_fingerprint: Opt[str | None] = NOT_SET,
   +        signature_path: Opt[str | None] = NOT_SET,
   +        checksum_path: Opt[str | None] = NOT_SET,
   +        classification: Opt[str | None] = NOT_SET,
   +        svn_revision: Opt[str | None] = NOT_SET,
   +    ) -> Query[sql.ArtifactCatalog]:
   +        query = sqlmodel.select(sql.ArtifactCatalog)
   +
   +        if is_defined(project_key):
   +            query = query.where(sql.ArtifactCatalog.project_key == 
project_key)
   +        if is_defined(version_key):
   +            query = query.where(sql.ArtifactCatalog.version_key == 
version_key)
   +        if is_defined(artifact_path):
   +            query = query.where(sql.ArtifactCatalog.artifact_path == 
artifact_path)
   +        if is_defined(key_fingerprint):
   +            query = query.where(sql.ArtifactCatalog.key_fingerprint == 
key_fingerprint)
   +        if is_defined(signature_path):
   +            query = query.where(sql.ArtifactCatalog.signature_path == 
signature_path)
   +        if is_defined(checksum_path):
   +            query = query.where(sql.ArtifactCatalog.checksum_path == 
checksum_path)
   +        if is_defined(classification):
   +            query = query.where(sql.ArtifactCatalog.classification == 
classification)
   +        if is_defined(svn_revision):
   +            query = query.where(sql.ArtifactCatalog.svn_revision == 
svn_revision)
   +
   +        return Query(self, query)
   +
        def check_result(
            self,
            id: Opt[int] = NOT_SET,
   ````
   
   ### Open questions
   - Should `version_key` be a foreign key referencing a release table, or 
remain a plain indexed string to allow catalog entries independent of 
ATR-managed releases?
   - What values should `classification` hold? Should it be a StrEnum (like the 
classify.FileType used elsewhere) or a free-form string?
   - Should there be a relationship defined back to `Release` (via release_key) 
or is the (project_key, version_key) pair sufficient?
   - How and when are ArtifactCatalog entries populated — during SVN import 
tasks, at release publication, or retroactively from dist:release scanning?
   - The exact location of `ReleaseFileState` in sql.py is unknown (file was 
truncated), so the placement anchor may need adjustment.
   
   ### Files examined
   - `atr/models/sql.py`
   - `atr/db/__init__.py`
   - `atr/storage/__init__.py`
   - `atr/storage/writers/release.py`
   - `atr/storage/writers/keys.py`
   - `atr/storage/writers/ssh.py`
   - `atr/storage/readers/releases.py`
   - `atr/storage/types.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to