The GitHub Actions job "Required Checks" on texera.git/gh-readonly-queue/main/pr-5417-cd6053567094fcab1dfffd91e070411f64769d14 has failed. Run started by GitHub user bobbai00 (triggered by bobbai00).
Head commit for run: 38454a56c473de5fc066bb52083ccda701359ead / Jiadong Bai <[email protected]> chore(licensing): add per-module NOTICE-binary generation script and CI checks for detecting NOTICE-binary drifting (#5417) ### What changes were proposed in this PR? Auto-generates each module's `NOTICE-binary` from the third-party `META-INF/NOTICE` files in its bundled jars — replacing the hand-curated subsets introduced in #4668 — and adds a CI drift-check so the committed files can never silently rot when dependencies change. - **New generator — `bin/licensing/generate_notice_binary.py`:** walks a module's dist `lib/` dir, extracts every `META-INF/NOTICE` (and root-level `NOTICE`) from each bundled jar, skips first-party `org.apache.texera.*` jars, dedupes by content hash so jars sharing an upstream notice collapse into one block, prepends the project's own root `NOTICE`, and emits one block per unique notice with a synthesized heading + the contributing-jar list. Output is deterministic (CRLF→LF normalized, stably sorted by jar-count). An optional `--extras <file>` appends non-jar attributions. - **`amber/NOTICE-binary-extras` (new):** the aiohttp + Matplotlib notices, which ship as Python wheels (not jars) and so can't be extracted from the `lib/` dir. - **6 per-module `NOTICE-binary` files regenerated** from the actual bundled jars: `amber`, `access-control-service`, `config-service`, `file-service`, `computing-unit-managing-service`, `workflow-compiling-service`. - **CI drift-check (`build.yml`):** after each dist is built and unzipped, a new step regenerates that module's `NOTICE-binary` and diffs it against the committed file, failing the build with a one-line fix-up command on any drift. The amber check runs in the scala job; the five platform services are each checked in the per-service `platform` matrix job, alongside the existing `LICENSE-binary` check. `LICENSE-binary` stays hand-maintained (it needs human judgment on each license); only `NOTICE-binary` — a mechanical carry-forward of upstream notices — is generated. So future dep bumps fail CI with the exact command to regenerate, instead of silently drifting. ### Any related issues, documentation, discussions? Closes #4674 ASF guidance: https://infra.apache.org/licensing-howto.html (Apache-2.0 §4(d)). ### How was this PR tested? - Built all six module dists locally (`sbt <project>/Universal/stage`) and ran the generator against each freshly-built `lib/`; the committed `NOTICE-binary` files are byte-identical to the generator output, so the new CI drift-check passes for every module. - Verified the existing `LICENSE-binary` checks (`check_binary_deps.py`, PR mode) still pass against the same libs for all six modules. - `build.yml` validated as well-formed YAML. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 (1M context) --------- Co-authored-by: Bob Bai <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/27273904225 With regards, GitHub Actions via GitBox
