asf-tooling commented on issue #1068:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1068#issuecomment-4409827864

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `new_feature`  •  **Classification:** `actionable`  •  
**Confidence:** `medium`
   **Application domain(s):** `automated_checks`, `shared_infrastructure`
   
   ### Summary
   The issue requests automated version-staleness monitoring and hash 
verification for external tools installed in the Dockerfile. However, the 
issue's claim that 'syft and cyclonedx-cli are installed via curl without hash 
verification' is factually incorrect against the current code — both already 
have SHA256 verification (lines 61+63–65 and lines 71+72–74 of 
Dockerfile.alpine). The valid remaining asks are: (1) a CI script to detect 
stale tool versions, (2) documentation of an update policy. The existing 
`scripts/check_when_dependencies_updated.py` provides a clear pattern for 
implementing the freshness check. Dependabot already monitors Docker base 
images but does NOT monitor tool versions embedded as ENV variables.
   
   ### Where this lives in the code today
   
   #### `scripts/check_when_dependencies_updated.py` — `main` (lines 28-54)
   _extension point_
   This is the pattern to follow — a CI-friendly freshness checker that exits 
non-zero when tools are too old.
   
   ```python
   def main() -> None:
       lock_path = pathlib.Path("uv.lock")
       if not lock_path.exists():
           print("ERROR: uv.lock not found", file=sys.stderr)
           sys.exit(1)
   
       exclude_newer = _parse_exclude_newer(lock_path)
       if exclude_newer is None:
           print("ERROR: No exclude-newer timestamp in uv.lock", 
file=sys.stderr)
           print("Run: make update-deps", file=sys.stderr)
           sys.exit(1)
   
       timestamp = _parse_timestamp(exclude_newer)
       if timestamp is None:
           print(f"ERROR: Could not parse timestamp: {exclude_newer}", 
file=sys.stderr)
           sys.exit(1)
   
       now = datetime.datetime.now(datetime.UTC)
       age = now - timestamp
   
       if age > datetime.timedelta(days=_MAX_AGE_DAYS):
           print(f"ERROR: Dependencies are {age.days} days old (the limit is 
{_MAX_AGE_DAYS} days)", file=sys.stderr)
           print(f"Last updated: {exclude_newer}", file=sys.stderr)
           print("Run: make update-deps", file=sys.stderr)
           sys.exit(1)
   
       print(f"OK: Dependencies are {age.days} days old (the limit is 
{_MAX_AGE_DAYS} days)")
   ```
   
   ### Where new code would go
   - `scripts/check_dockerfile_tool_versions.py` — new file
     New CI script to check staleness of external tool versions in 
Dockerfile.alpine, following the pattern of check_when_dependencies_updated.py.
   - `.pre-commit-config.yaml` — after existing hooks
     The check could be integrated as a pre-commit hook (following the pattern 
of check_when_dependencies_updated.py) rather than a separate workflow step.
   
   ### Proposed approach
   The primary valid ask is a staleness check for Dockerfile tool versions. 
Since hash verification is already present for syft, cyclonedx-cli, and RAT 
(the issue's premise is partially stale), the main gap is automated version-age 
monitoring. A new script `scripts/check_dockerfile_tool_versions.py` should 
parse ENV version declarations from `Dockerfile.alpine`, query the GitHub 
Releases API (or equivalent) for the release date of each pinned version, and 
fail if any tool exceeds a configurable maximum age (e.g., 90 days). This 
script should be integrated into the pre-commit framework or the analyze 
workflow.
   
   The script should handle API rate limits gracefully (don't fail on 
403/network errors, just warn) and should support a `GITHUB_TOKEN` env var for 
authenticated requests. For Apache RAT, which is an Apache project not on 
GitHub Releases in the same way, a different strategy (checking the Apache 
mirrors or skipping with a note) may be needed. The `go install`-based tools 
(parlay, sbomqs) should also be monitored since Go module hashes aren't 
explicitly pinned in the Dockerfile.
   
   ### Suggested patches
   
   #### `scripts/check_dockerfile_tool_versions.py`
   New CI script to enforce tool version freshness, following the pattern of 
check_when_dependencies_updated.py
   
   ````diff
   --- /dev/null
   +++ b/scripts/check_dockerfile_tool_versions.py
   @@ -0,0 +1,105 @@
   +#!/usr/bin/env python3
   +
   +# Licensed to the Apache Software Foundation (ASF) under one
   +# or more contributor license agreements.  See the NOTICE file
   +# distributed with this work for additional information
   +# regarding copyright ownership.  The ASF licenses this file
   +# to you under the Apache License, Version 2.0 (the
   +# "License"); you may not use this file except in compliance
   +# with the License.  You may obtain a copy of the License at
   +#
   +#   http://www.apache.org/licenses/LICENSE-2.0
   +#
   +# Unless required by applicable law or agreed to in writing,
   +# software distributed under the License is distributed on an
   +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   +# KIND, either express or implied.  See the License for the
   +# specific language governing permissions and limitations
   +# under the License.
   +
   +"""Check that Dockerfile tool versions are not stale.
   +
   +Parses ENV VERSION declarations from Dockerfile.alpine and checks their
   +release dates via the GitHub API. Exits non-zero if any tool exceeds
   +the maximum age.
   +"""
   +
   +import datetime
   +import os
   +import pathlib
   +import re
   +import sys
   +from typing import Final
   +
   +import requests
   +
   +_MAX_AGE_DAYS: Final[int] = 90
   +_DOCKERFILE: Final[str] = "Dockerfile.alpine"
   +
   +# Maps ENV variable name to GitHub owner/repo
   +_TOOLS: Final[dict[str, str]] = {
   +    "SYFT_VERSION": "anchore/syft",
   +    "PARLAY_VERSION": "snyk/parlay",
   +    "SBOMQS_VERSION": "interlynk-io/sbomqs",
   +    "CDXCLI_VERSION": "CycloneDX/cyclonedx-cli",
   +}
   +
   +# Apache RAT is excluded: it uses Apache mirrors, not GitHub Releases.
   +# TODO: consider adding a check against https://dlcdn.apache.org/creadur/
   +
   +
   +def _parse_versions(dockerfile_path: pathlib.Path) -> dict[str, str]:
   +    """Extract ENV tool versions from the Dockerfile."""
   +    text = dockerfile_path.read_text(encoding="utf-8")
   +    versions: dict[str, str] = {}
   +    for env_var in _TOOLS:
   +        match = re.search(rf"ENV {env_var}=(\S+)", text)
   +        if match:
   +            versions[env_var] = match.group(1).strip('"')
   +    return versions
   +
   +
   +def _get_release_date(repo: str, version: str, headers: dict[str, str]) -> 
datetime.datetime | None:
   +    """Query GitHub API for the release date of a given version."""
   +    # Try with and without 'v' prefix
   +    for tag in (f"v{version}", version):
   +        url = f"https://api.github.com/repos/{repo}/releases/tags/{tag}";
   +        resp = requests.get(url, headers=headers, timeout=10)
   +        if resp.status_code == 200:
   +            published = resp.json().get("published_at", "")
   +            if published:
   +                return 
datetime.datetime.fromisoformat(published.replace("Z", "+00:00"))
   +    return None
   +
   +
   +def main() -> None:
   +    dockerfile_path = pathlib.Path(_DOCKERFILE)
   +    if not dockerfile_path.exists():
   +        print(f"ERROR: {_DOCKERFILE} not found", file=sys.stderr)
   +        sys.exit(1)
   +
   +    versions = _parse_versions(dockerfile_path)
   +    if not versions:
   +        print("ERROR: No tool versions found in Dockerfile", 
file=sys.stderr)
   +        sys.exit(1)
   +
   +    headers: dict[str, str] = {"Accept": "application/vnd.github+json"}
   +    token = os.environ.get("GITHUB_TOKEN")
   +    if token:
   +        headers["Authorization"] = f"Bearer {token}"
   +
   +    now = datetime.datetime.now(datetime.UTC)
   +    all_ok = True
   +
   +    for env_var, version in versions.items():
   +        repo = _TOOLS[env_var]
   +        release_date = _get_release_date(repo, version, headers)
   +
   +        if release_date is None:
   +            print(f"WARNING: Could not determine release date for 
{env_var}={version} ({repo})")
   +            continue
   +
   +        age_days = (now - release_date).days
   +        if age_days > _MAX_AGE_DAYS:
   +            print(
   +                f"ERROR: {env_var}={version} is {age_days} days old (max 
{_MAX_AGE_DAYS})",
   +                file=sys.stderr,
   +            )
   +            all_ok = False
   +        else:
   +            print(f"OK: {env_var}={version} is {age_days} days old (max 
{_MAX_AGE_DAYS})")
   +
   +    sys.exit(0 if all_ok else 1)
   +
   +
   +if __name__ == "__main__":
   +    main()
   ````
   
   #### `.github/workflows/analyze.yml`
   Add a step to run the tool version freshness check in CI
   
   ````diff
   --- a/.github/workflows/analyze.yml
   +++ b/.github/workflows/analyze.yml
   @@ -52,3 +52,10 @@ jobs:
          - name: Run pre-commit
            run: |
              uv run --frozen pre-commit run --show-diff-on-failure 
--color=always --all-files
   +
   +      - name: Check Dockerfile tool versions
   +        env:
   +          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
   +        run: |
   +          uv run --frozen python scripts/check_dockerfile_tool_versions.py
   ````
   
   ### Open questions
   - The issue claims syft and cyclonedx-cli lack hash verification, but both 
already have SHA256 checks in the current Dockerfile. Should this issue be 
partially closed or updated to reflect reality?
   - For syft, the hash pins the install script rather than the final binary — 
is that considered sufficient, or should the downloaded binary also be pinned? 
The install script itself may verify the binary.
   - Apache RAT uses Apache mirrors and has SHA512 from upstream — should it 
also be included in the staleness check, and if so, how should the release date 
be determined (no GitHub Releases page)?
   - Should `requests` be added as a dev/script dependency in pyproject.toml, 
or should the script use only stdlib (urllib)?
   - What is the appropriate MAX_AGE_DAYS? The issue suggests 90 days, but the 
existing check_when_dependencies_updated.py uses 30 days for Python deps.
   
   ### Files examined
   - `Dockerfile.alpine`
   - `.github/workflows/analyze.yml`
   - `scripts/check_when_dependencies_updated.py`
   - `BUILD.md`
   - `.github/dependabot.yml`
   - `scripts/github_tag_dates.py`
   - `DEVELOPMENT.md`
   - `docker-compose.yml`
   
   ### Related issues
   This issue appears related to: #1067.
   
   _Both address missing update timeframes and monitoring for external 
dependencies (Docker tools vs npm)_
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to