andrewmusselman commented on issue #1154:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1154#issuecomment-4505019470

   This should be ready to go whenever we want to incorporate it into the app: 
https://github.com/apache/tooling-asfswhid
   
   Happy to break any of these off into their own issues; @dave2wave @sbp 
@alitheg let me know.
   
   ## Backend checks
   
   1. **New `atr/tasks/checks/swhid.py` — cross-format equivalence check.** For 
each archive in a revision, extract → `directory_id(root)` → compare. Pass if 
all match, blocker if not. Uses existing `checks.resolve_archive_dir()` 
(`atr/tasks/checks/__init__.py:341`) and `targz.root_directory()`. Register in 
`atr/tasks/__init__.py:308` `resolve()`. Nothing in the current code answers 
"is the .tar.gz equivalent to the .zip?".
   
   2. **Augment `compare.source_trees` with SWHID.** 
`atr/tasks/checks/compare.py:260` shells out to `rsync --checksum --dry-run`. 
Replace the happy path with `directory_id(archive) == revision_id(repo, 
payload.sha)`; keep rsync as the fallback that explains *which* paths differ 
when they don't match. Removes hard `rsync` PATH dependency for the common 
case, and naturally handles the `.gitattributes export-ignore` / `eol=crlf` 
cases dave2wave flagged if we apply git's export rules before hashing.
   
   3. **New `atr/swhid.py` helper module** alongside `atr/hashes.py`. Thin 
async wrappers over `asfswhid.content_id_from_file` and 
`asfswhid.directory_id`, with export-ignore exclusions centralized so every 
caller uses the same invocation.
   
   ## Storage / data model
   
   4. **Extend `AttestableV2` in `atr/models/attestable.py`.**
    - Add `swhid: str | None` to `PathEntryV2` (line 64) — per-file `cnt` 
identifier alongside the existing `content_hash`.
    - Add top-level `directory_swhids: dict[str, str]` mapping archive rel-path 
→ `dir` SWHID. This is what downstream consumers (the SLSA work in 
`commons-release-plugin#422`, SH indexing) actually want to read.
    - Or bump to `AttestableV3` per the existing versioning pattern.
   
   5. **Add `SWHIDCheck` result kind in `atr/models/results.py`.** Mirror 
`HashingCheck` (line 46); add to the `Results` union at line 262. Makes SWHID 
values flow through the existing check-result UI without new plumbing.
   
   ## User / voter-facing surfaces
   
   6. **Show SWHIDs on the per-file report page** 
(`atr/templates/report-selected-path.html`). One block per archive showing the 
`dir` SWHID and what it matches against (sibling archives, git commit). Gives 
voters a third-party-verifiable identifier they can paste into 
`archive.softwareheritage.org` or recompute locally.
   
   7. **New template variable for the vote email.** `atr/construct.py:35` 
`TEMPLATE_VARIABLES`. Add `ARCHIVE_SWHIDS` so RMs can include them in the 
start-vote email body — out-of-band identifier voters can verify against their 
own download.
   
   8. **"Built from commit X" badge on the download / publication pages** 
(`atr/templates/download-all.html`). When archive-vs-git-tree SWHIDs match, 
surface the claim with links to the commit and the SWH browse URL.
   
   ## Cross-cutting
   
   9. **Public API endpoint.** `GET /api/release/{project}/{version}/swhids` 
returning `{rel_path: swhid}` in `atr/api/`. For SLSA attestation builders, SH 
indexers, downstream integrity checkers.
   
   10. **SBOM enrichment.** Emit SWHID as evidence identity on CycloneDX 
components in `atr/sbom/cyclonedx.py`.
   
   11. **Compute `revision_id` against the GitHub TP payload.** 
`atr/models/github.py:53` already carries `payload.sha` and 
`payload.repository`. After the existing clone in 
`compare._checkout_github_source`, call `asfswhid.revision_id(checkout_dir, 
payload.sha)` and store it on the release record.
   
   ---
   
   If we pick one to land first, I'd suggest #1 — clearest new capability, 
smallest surface area, directly answers the use case this issue opened on. #6 
then makes it visible.
   
   Happy to split any of these into their own issues — let me know which are 
worth scoping out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to