asf-tooling commented on issue #911:
URL:
https://github.com/apache/tooling-trusted-releases/issues/911#issuecomment-4410068681
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `new_feature` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `shared_infrastructure`, `release_lifecycle`,
`cryptographic_keys`
### Summary
Issue #911 requests a new SQLModel table for tracking release artifact
metadata as part of a 'Release Catalog'. The proposed model links artifacts to
their GPG signing keys, signature files, checksum files, classification
results, and SVN revision provenance. The codebase currently has
`ReleaseFileState` which tracks file-level state per revision within ATR, but
does not have a dedicated table relating artifacts to their cryptographic
metadata and distribution provenance. The new model would be defined in
`atr/models/sql.py` with a corresponding query builder in `atr/db/__init__.py`.
### Where this lives in the code today
#### `atr/models/sql.py` — `KeyLink` (lines 465-467)
_extension point_
Shows how the existing schema links entities to PublicSigningKey via
fingerprint foreign keys, which the new ArtifactCatalog model would also do.
```python
class KeyLink(sqlmodel.SQLModel, table=True):
committee_key: str = sqlmodel.Field(foreign_key="committee.key",
primary_key=True)
key_fingerprint: str =
sqlmodel.Field(foreign_key="publicsigningkey.fingerprint", primary_key=True)
```
#### `atr/db/__init__.py` — `Session.release_file_state` (lines 703-727)
_extension point_
Shows the pattern for query builder methods in the Session class; a similar
`artifact_catalog` method would be added.
```python
def release_file_state(
self,
release_key: Opt[str] = NOT_SET,
path: Opt[str] = NOT_SET,
since_revision_seq: Opt[int] = NOT_SET,
present: Opt[bool] = NOT_SET,
content_hash: Opt[str | None] = NOT_SET,
classification: Opt[str | None] = NOT_SET,
) -> Query[sql.ReleaseFileState]:
query = sqlmodel.select(sql.ReleaseFileState)
if is_defined(release_key):
query = query.where(sql.ReleaseFileState.release_key ==
release_key)
if is_defined(path):
query = query.where(sql.ReleaseFileState.path == path)
if is_defined(since_revision_seq):
query = query.where(sql.ReleaseFileState.since_revision_seq ==
since_revision_seq)
if is_defined(present):
query = query.where(sql.ReleaseFileState.present == present)
if is_defined(content_hash):
query = query.where(sql.ReleaseFileState.content_hash ==
content_hash)
if is_defined(classification):
query = query.where(sql.ReleaseFileState.classification ==
classification)
return Query(self, query)
```
#### `atr/models/sql.py` — `ReleasePhase` (lines 177-185)
_currently does this_
The release phase model shows lifecycle context; artifact catalog entries
would track artifacts that reach the RELEASE phase (published to dist:release).
```python
class ReleasePhase(enum.StrEnum):
# Step 1: The candidate files are added from external sources and
checked by ATR
RELEASE_CANDIDATE_DRAFT = "release_candidate_draft"
# Step 2: The project members are voting on the candidate release
RELEASE_CANDIDATE = "release_candidate"
# Step 3: The release files are being put in place
RELEASE_PREVIEW = "release_preview"
# Step 4: The release has been announced
RELEASE = "release"
```
#### `atr/storage/readers/releases.py` — `GeneralPublic.path_info` (lines
57-80)
_currently does this_
Shows how classifications and path metadata are currently read; the new
artifact catalog could be populated from this data as artifacts are published.
```python
async def path_info(self, release: sql.Release, all_paths:
list[safe.RelPath]) -> types.PathInfo | None:
info = types.PathInfo()
latest_revision_number = release.latest_revision_number
if latest_revision_number is None:
return None
await self.__successes_errors_warnings(release,
release.safe_latest_revision_number, info)
base_path = paths.release_directory(release)
revision_seq = int(str(release.safe_latest_revision_number))
db_classifications = await
self.__data.release_file_classifications_at(release.key, revision_seq)
source_matcher, binary_matcher = classify.matchers_from_policy(
release.project.policy_source_artifact_paths,
release.project.policy_binary_artifact_paths,
base_path,
)
for path in all_paths:
db_value = db_classifications.get(str(path))
if db_value is not None:
info.file_types[path] = classify.FileType(db_value)
else:
info.file_types[path] = await classify.classify(
path, base_path=base_path,
source_matcher=source_matcher, binary_matcher=binary_matcher
)
self.__compute_checker_stats(info, all_paths)
return info
```
### Where new code would go
- `atr/models/sql.py` — after symbol ReleaseFileState
The new ArtifactCatalog model should be placed near ReleaseFileState since
both track file-level metadata for releases.
- `atr/db/__init__.py` — after symbol Session.release_file_state
A new query builder method for the ArtifactCatalog model following the
established pattern.
### Proposed approach
Define a new `ArtifactCatalog` SQLModel table in `atr/models/sql.py` with
the fields specified in the issue: `artifact_path`, `key_fingerprint` (FK to
PublicSigningKey), `signature_path`, `checksum_path`, `classification`,
`project_key` (FK to Project), `version_key`, and `svn_revision`. The composite
natural key is (project_key, version_key, artifact_path) since an artifact is
uniquely identified within a project version. Add a corresponding query builder
method in `atr/db/__init__.py` following the session pattern. An Alembic
migration will also be needed but is not included here since it should be
auto-generated after the model is finalized.
The note in the issue that 'not all of a release's artifacts were added in
the same revision' justifies the per-artifact `svn_revision` field rather than
a release-level one. The `checksum_path` field with 'pick the best' logic
suggests storing only the strongest available hash file path (SHA-512 > SHA-256
> MD5).
### Suggested patches
#### `atr/models/sql.py`
Add the ArtifactCatalog SQLModel table with the fields described in the
issue.
````diff
--- a/atr/models/sql.py
+++ b/atr/models/sql.py
@@ -348,6 +348,34 @@ sqlmodel.SQLModel.metadata = sqlalchemy.MetaData(
class KeyLink(sqlmodel.SQLModel, table=True):
committee_key: str = sqlmodel.Field(foreign_key="committee.key",
primary_key=True)
key_fingerprint: str =
sqlmodel.Field(foreign_key="publicsigningkey.fingerprint", primary_key=True)
+# ArtifactCatalog:
+class ArtifactCatalog(sqlmodel.SQLModel, table=True):
+ """Tracks metadata for release artifacts in the Release Catalog.
+
+ Each entry represents a single artifact within a project version.
+ Historically not all of a release's artifacts were added in the same
+ SVN revision, so svn_revision is tracked per-artifact.
+ """
+
+ # Composite primary key: an artifact is unique within a project+version
+ project_key: str = sqlmodel.Field(foreign_key="project.key",
primary_key=True, index=True)
+ version_key: str = sqlmodel.Field(primary_key=True, index=True)
+ artifact_path: str = sqlmodel.Field(primary_key=True,
**example("apache-example-1.0.0.tar.gz"))
+
+ # The GPG public key fingerprint used to sign the artifact
+ key_fingerprint: str | None = sqlmodel.Field(
+ default=None, foreign_key="publicsigningkey.fingerprint", index=True
+ )
+
+ # Path to detached signature file (not a URL)
+ signature_path: str | None = sqlmodel.Field(default=None)
+
+ # Path to the best available checksum file (SHA-512 preferred)
+ checksum_path: str | None = sqlmodel.Field(default=None)
+
+ # Result of the classify process (e.g. source, binary, docs)
+ classification: str | None = sqlmodel.Field(default=None)
+
+ # SVN revision where artifact was added to dist:release (via mv or add)
+ svn_revision: str | None = sqlmodel.Field(default=None, index=True)
+
# PersonalAccessToken:
class PersonalAccessToken(sqlmodel.SQLModel, table=True):
````
#### `atr/db/__init__.py`
Add a query builder method for ArtifactCatalog following the established
Session pattern.
````diff
--- a/atr/db/__init__.py
+++ b/atr/db/__init__.py
@@ -140,6 +140,33 @@ class Session(sqlalchemy.ext.asyncio.AsyncSession):
async def begin_immediate(self) -> None:
await self.execute(sqlalchemy.text("BEGIN IMMEDIATE"))
+ def artifact_catalog(
+ self,
+ project_key: Opt[str] = NOT_SET,
+ version_key: Opt[str] = NOT_SET,
+ artifact_path: Opt[str] = NOT_SET,
+ key_fingerprint: Opt[str | None] = NOT_SET,
+ signature_path: Opt[str | None] = NOT_SET,
+ checksum_path: Opt[str | None] = NOT_SET,
+ classification: Opt[str | None] = NOT_SET,
+ svn_revision: Opt[str | None] = NOT_SET,
+ ) -> Query[sql.ArtifactCatalog]:
+ query = sqlmodel.select(sql.ArtifactCatalog)
+
+ if is_defined(project_key):
+ query = query.where(sql.ArtifactCatalog.project_key ==
project_key)
+ if is_defined(version_key):
+ query = query.where(sql.ArtifactCatalog.version_key ==
version_key)
+ if is_defined(artifact_path):
+ query = query.where(sql.ArtifactCatalog.artifact_path ==
artifact_path)
+ if is_defined(key_fingerprint):
+ query = query.where(sql.ArtifactCatalog.key_fingerprint ==
key_fingerprint)
+ if is_defined(signature_path):
+ query = query.where(sql.ArtifactCatalog.signature_path ==
signature_path)
+ if is_defined(checksum_path):
+ query = query.where(sql.ArtifactCatalog.checksum_path ==
checksum_path)
+ if is_defined(classification):
+ query = query.where(sql.ArtifactCatalog.classification ==
classification)
+ if is_defined(svn_revision):
+ query = query.where(sql.ArtifactCatalog.svn_revision ==
svn_revision)
+
+ return Query(self, query)
+
def check_result(
self,
id: Opt[int] = NOT_SET,
````
### Open questions
- Should `version_key` be a foreign key referencing a release table, or
remain a plain indexed string to allow catalog entries independent of
ATR-managed releases?
- What values should `classification` hold? Should it be a StrEnum (like the
classify.FileType used elsewhere) or a free-form string?
- Should there be a relationship defined back to `Release` (via release_key)
or is the (project_key, version_key) pair sufficient?
- How and when are ArtifactCatalog entries populated — during SVN import
tasks, at release publication, or retroactively from dist:release scanning?
- The exact location of `ReleaseFileState` in sql.py is unknown (file was
truncated), so the placement anchor may need adjustment.
### Files examined
- `atr/models/sql.py`
- `atr/db/__init__.py`
- `atr/storage/__init__.py`
- `atr/storage/writers/release.py`
- `atr/storage/writers/keys.py`
- `atr/storage/writers/ssh.py`
- `atr/storage/readers/releases.py`
- `atr/storage/types.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]