[PR] Generate the VDR from per-CVE source files [logging-site]

via GitHub Fri, 24 Apr 2026 05:54:19 -0700


ppkarwasz opened a new pull request, #26:
URL: https://github.com/apache/logging-site/pull/26


   This change replaces our hand-maintained `src/site/static/cyclonedx/vdr.xml` 
with a generated artifact assembled from one source file per `(CVE, component)` 
pair under `src/vulnerabilities/`.
   
   To regenerate the VDR after editing any per-CVE file:
   
   ```
   uv run scripts/vdr_aggregate.py
   ```
   
   To split an existing monolithic VDR back into per-CVE files (one-time 
migration, or recovery):
   
   ```
   uv run scripts/vdr_split.py
   ```
   
   ## Why
   
   The current hand-edited VDR is becoming hard to maintain reliably:
   
   1. **Timestamps drift.** In the latest release we forgot to bump 
`metadata.timestamp` to the max of every `vulnerability.updated`. The 
aggregator now computes this automatically.
   2. **Ordering is hard to keep straight.** Vulnerabilities in the file are 
not strictly sorted, and components are listed in an ad-hoc order. The 
aggregator enforces deterministic order: vulnerabilities by `(year DESC, number 
DESC)`, components alphabetically by `bom-ref`.
   3. **Merge conflicts on simultaneous additions.** Adding seven 
vulnerabilities in a single batch (as in the most recent disclosure) is 
error-prone. Per-CVE files let contributors add or edit vulnerabilities 
independently.
   
   ## How it works
   
   Each vulnerability lives in its own file at 
`src/vulnerabilities/<CVE-id>/<component>.cdx.xml`: a self-contained CycloneDX 
1.7 BOM with the affected component as `metadata.component` and a single 
`<vulnerability>` element. `log4cxx-conan` never gets its own file; its 
vulnerabilities ride along in the corresponding `log4cxx` file via a 
`<components>` entry plus a `<dependencies>` edge.
   
   `vdr_aggregate.py` walks every per-CVE file, dedupes components by 
`bom-ref`, dedupes vulnerabilities by CVE id, and emits the monolithic 
`vdr.xml`. `vdr_split.py` performs the inverse for migration. Both scripts 
share `vdr_common.py` (constants, namespace handling, comparison, 
write-if-changed orchestration).
   
   ### Idempotent writes
   
   Both scripts read the existing output's `serialNumber` and `version`, build 
a candidate at the existing version, and compare it to the file on disk via a 
structural comparison that ignores comments, inter-element whitespace, and 
namespace prefixes. If the candidate is equivalent, the file is left untouched: 
no diff, no version churn. If it differs, the version is bumped by one and the 
file is rewritten.
   
   This means re-running either script in a clean tree is a no-op, and a 
content edit produces exactly one version bump per affected file.
   
   ## Why split per (CVE, component), beyond automation
   
   1. **Path to VEX.** A monolithic VDR has no meaningful `metadata.component`, 
since it covers many subjects. Per-component files let `metadata.component` 
name the analyzing project (e.g. `log4j-core`), with the vulnerable dependency 
in `vulnerability.affects` and the dependency path in `<dependencies>`. That's 
the shape required for VEX, CSAF, and OpenVEX, so we can grow into those 
formats without restructuring our source of truth.
   2. **Easier asciidoc generation.** Per-CVE files let `_vulnerabilities.adoc` 
be assembled from one generated partial per CVE, instead of a single monolithic 
AsciiDoc.
      We can also later decide to have a separate page per CVE.
   
   ## Repository layout
   
   ```
   scripts/
     vdr_common.py        # shared helpers (constants, clone, serialize, 
equivalent, write_bom_if_changed)
     vdr_aggregate.py     # per-CVE files -> vdr.xml
     vdr_split.py         # vdr.xml -> per-CVE files
   src/vulnerabilities/
     CVE-2017-5645/log4j-core.cdx.xml
     CVE-2018-1285/log4net.cdx.xml
     ...
     template.cdx.xml     # editable template for new CVEs
   src/site/static/cyclonedx/vdr.xml   # generated output
   ```
   
   ## Notes for reviewers
   
   - The aggregated `vdr.xml` is now CycloneDX 1.7 (was 1.6). The bump is 
intentional: 1.x is semantically versioned, and the structural change is just a 
namespace rename.
   - Component ordering changed: alphabetical by `bom-ref`, so `log4j-1.2-api` 
now precedes `log4j-core`.
   - The aggregated header comment now warns it's generated and points at `uv 
run scripts/vdr_aggregate.py` for updates.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Generate the VDR from per-CVE source files [logging-site]

Reply via email to