andrewmusselman commented on issue #751: URL: https://github.com/apache/tooling-trusted-releases/issues/751#issuecomment-4356989911
@sbp @dave2wave @alitheg here is what my robot says to do about it, RFC ## Review of the investigation draft The draft is in good shape and the headline conclusion (skip Maven, no-op for PyPI / Hex / Artifact Hub, promotion is a marginal win) is the right call. Below are some corrections to the document, a narrower schema proposal than the one in the draft, and concrete patches for the changes that are worth landing alongside it. --- ## Document corrections ### Maven Central staging URL The doc abbreviates this as `repository.apache.org`. The actual configured value in `atr/models/sql.py` is: ``` https://repository.apache.org:4443/repository/maven-staging/{owner_namespace}/{package}/maven-metadata.xml ``` The port (`4443`) and `/repository/maven-staging/` path matter — RAO listens there for the staging repo. Either quote it in full or drop the parenthetical so the doc isn't subtly wrong. ### Docker Hub staging The "Yes (TODO in code)" entry is technically accurate, but the open question isn't "what staging URL?" — the commented `template_staging_url` in `sql.py` is identical to the production URL. The actual TODO is whether a separate staging-tag namespace is meaningful in our model at all: same repo with a `-staging` tag suffix, or a different repo entirely. That decision changes whether retag-promotion is even applicable, so it's worth surfacing in the doc. ### GitHub Releases The promotion mechanism (PATCH `prerelease: false`) is right, but the staging and production templates have *different shapes*: staging POSTs to `…/releases` (because creating a release is the staging act), while production GETs `…/releases/tags/v{version}`. That asymmetry is a small implementation gotcha that the rest of the platforms don't have — worth a sentence. ### npm staging The doc says staging is "dist-tags." Accurate, but it undersells a real difference from Test PyPI: npm has no separate staging *registry* — staging is just "publish under a non-`latest` dist-tag on the same registry." A bare `npm install package` won't pull the staged version, but the version is publicly listed and immutable once published. That's a meaningfully different trust model from Test PyPI and deserves to be called out next to the table. ### Erlang (Hex) The doc says "Not configured." Correct, but more importantly: `DistributionPlatform` in `sql.py` has no `HEX` member at all. Hex isn't just unconfigured for staging — it isn't modeled as a platform yet. That's the actual prerequisite for any further Erlang work, not promotion design. --- ## Schema proposal — narrower than the draft The draft proposes adding `supports_promotion`, `promotion_mechanism: str`, `promoted: bool`, `promoted_at`, and `staging_revision_key`. Most of these don't earn their keep: - **`supports_promotion: bool`** is derivable from `promotion_mechanism is not None`. Don't store both. - **`promotion_mechanism: str`** should be a typed enum (`PromotionMechanism.RETAG | DIST_TAG | PRERELEASE_FLAG`), not a free-form string. Stringly-typed schema columns always rot. - **`promoted: bool` and `promoted_at`** duplicate state. The existing `staging` boolean on `Distribution` already encodes this — when it flips false on a row that was previously `True`, that *is* the promotion. The `__upgrade_staging_to_final` method in `atr/storage/writers/distributions.py` already handles this transition. Don't add a parallel boolean. - **`staging_revision_key`** is the one that genuinely matters and is genuinely missing. The original issue calls this out: *"we have to check that the files being promoted are the same as those that were voted on."* Without recording which `Revision` was staged, ATR can't enforce that. It should land regardless of whether remote promotion is ever built, because it's also useful for the existing local re-upload path. So the actual schema delta is one nullable column. --- ## Patches ### Patch 1 — Add `staging_revision_key` to `Distribution` In `atr/models/sql.py`, inside `class Distribution(sqlmodel.SQLModel, table=True)` (around line 1185): ```diff staging: bool = sqlmodel.Field(default=False) pending: bool = sqlmodel.Field(default=False) retries: int = sqlmodel.Field(default=0) upload_date: datetime.datetime | None = sqlmodel.Field(default=None) api_url: str | None = sqlmodel.Field(default=None) web_url: str | None = sqlmodel.Field(default=None) created_by: str | None = sqlmodel.Field(default=None) + # Records which Revision was staged. Set when staging=True so that the + # finish-phase code can verify the artifacts being promoted match those + # that were voted on. See issue #751. + staging_revision_key: str | None = sqlmodel.Field( + default=None, foreign_key="revision.key" + ) ``` Then, at the `Distribution.record(...)` call site in `atr/storage/writers/distributions.py` (around line 143), accept and persist this value: ```diff async def record( self, release_key: models.safe.ReleaseKey, platform: models.sql.DistributionPlatform, owner_namespace: models.safe.Alphanumeric | None, package: models.safe.Alphanumeric, version: models.safe.VersionKey, staging: bool, pending: bool, upload_date: datetime.datetime | None, api_url: str | None = None, web_url: str | None = None, + staging_revision_key: str | None = None, ) -> tuple[models.sql.Distribution, bool]: namespace = str(owner_namespace) if owner_namespace else "" existing = await self.__data.distribution( str(release_key), platform, namespace, str(package), str(version) ).get() dist = models.sql.Distribution( platform=platform, release_key=str(release_key), owner_namespace=namespace, package=str(package), version=str(version), staging=staging, pending=pending, retries=0, upload_date=upload_date, api_url=api_url, web_url=web_url, created_by=self.__asf_uid, + staging_revision_key=staging_revision_key if staging else None, ) ``` The finish-phase code path (whichever caller drives `__upgrade_staging_to_final`) should then assert that `existing.staging_revision_key` equals the current `Revision.key` before promoting. That's a small follow-up rather than a schema concern. This will need an Alembic migration. The column is nullable, so existing rows are unaffected. ### Patch 2 — Stop hard-coding the staging-supported set In `atr/shared/distribution.py`, the `_template_url` function (around line 439): ```diff def _template_url( dd: distribution.Data, staging: bool | None = None, ) -> str: if staging is False: return dd.platform.value.template_url - supported = { - sql.DistributionPlatform.ARTIFACT_HUB, - sql.DistributionPlatform.PYPI, - sql.DistributionPlatform.MAVEN, - } - if dd.platform not in supported: - raise RuntimeError("Staging is currently supported only for ArtifactHub, PyPI and Maven Central.") - template_url = dd.platform.value.template_staging_url if template_url is None: - raise RuntimeError("This platform does not provide a staging API endpoint.") + raise RuntimeError( + f"Platform {dd.platform.value.name} does not provide a staging API endpoint." + ) return template_url ``` Rationale: the whitelist duplicates knowledge that's already in the enum — every platform with a `template_staging_url` already declares it. The whitelist is a footgun: when someone adds a new staging URL (for example, enabling GitHub Releases), they have to remember to update this set or staging silently breaks for the new platform. The enum should be the single source of truth, and the error message becomes more informative for free. ### Patch 3 — Decide `GITHUB`: enable or delete In `atr/models/sql.py`, around line 117, the commented-out enum member: ```python # GITHUB = DistributionPlatformValue( # name="GitHub", # gh_slug="github", # template_url="https://api.github.com/repos/{owner_namespace}/{package}/releases/tags/v{version}", # # Combine with {"prerelease": true} # template_staging_url="https://api.github.com/repos/{owner_namespace}/{package}/releases", # requires_owner_namespace=True, # ) ``` There are also two corresponding commented `# case models.sql.DistributionPlatform.GITHUB:` references in `atr/shared/distribution.py` (around lines 201 and 255). This block has been commented out long enough that it's noise. Either enable it (GitHub Releases is the cheapest promotion target after npm — one PATCH call, and the upload-date / web-url extractors already have placeholder cases waiting for it) or delete it. Leaving it commented while the investigation recommends enabling it is the worst of both worlds. --- ## Out-of-scope items to file separately The thread is currently mixing investigation, decision, and implementation. To close this issue cleanly: 1. **npm dist-tag promotion** — the most valuable promotion to actually build, per the doc's own priority list. File as a child issue. The implementation surface is small (one HTTP call) but it depends on npm distribution support being mature, which it isn't yet. 2. **GitHub Releases promotion** — depends on enabling the `GITHUB` platform first. File as a follow-up to whichever direction Patch 3 lands. 3. **Docker Hub retag promotion** — feasible per the doc, but requires manifest-digest tracking (a third schema concern beyond `staging_revision_key`). Not worth scoping until Docker Hub distribution is actually being used in anger. 4. **GH workflow integration** — there's an earlier comment in the thread asking to "add in information about GH workflows for staging and publishing to these models." The draft has a `gh_slug` column for display but doesn't engage with what ATR should *do* with the slug beyond label things. Either scope this out explicitly ("workflow integration out of scope for this issue") or open a separate design issue. --- ## Recommended close conditions 1. Merge the investigation document with the corrections above. 2. Land **Patch 1** (`staging_revision_key`) and **Patch 2** (whitelist removal). Both are independently useful regardless of any promotion work. 3. Make a yes/no call on **Patch 3** (`GITHUB`). 4. **Don't** build the generic `supports_promotion` / `promotion_mechanism` plumbing. Add it when there's a second platform that actually needs it. YAGNI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
