The GitHub Actions job "Required Checks" on 
texera.git/gh-readonly-queue/main/pr-5417-cd6053567094fcab1dfffd91e070411f64769d14
 has failed.
Run started by GitHub user bobbai00 (triggered by bobbai00).

Head commit for run:
38454a56c473de5fc066bb52083ccda701359ead / Jiadong Bai 
<[email protected]>
chore(licensing): add per-module NOTICE-binary generation script and CI checks 
for detecting NOTICE-binary drifting (#5417)

### What changes were proposed in this PR?

Auto-generates each module's `NOTICE-binary` from the third-party
`META-INF/NOTICE` files in its bundled jars — replacing the hand-curated
subsets introduced in #4668 — and adds a CI drift-check so the committed
files can never silently rot when dependencies change.

- **New generator — `bin/licensing/generate_notice_binary.py`:** walks a
module's dist `lib/` dir, extracts every `META-INF/NOTICE` (and
root-level `NOTICE`) from each bundled jar, skips first-party
`org.apache.texera.*` jars, dedupes by content hash so jars sharing an
upstream notice collapse into one block, prepends the project's own root
`NOTICE`, and emits one block per unique notice with a synthesized
heading + the contributing-jar list. Output is deterministic (CRLF→LF
normalized, stably sorted by jar-count). An optional `--extras <file>`
appends non-jar attributions.
- **`amber/NOTICE-binary-extras` (new):** the aiohttp + Matplotlib
notices, which ship as Python wheels (not jars) and so can't be
extracted from the `lib/` dir.
- **6 per-module `NOTICE-binary` files regenerated** from the actual
bundled jars: `amber`, `access-control-service`, `config-service`,
`file-service`, `computing-unit-managing-service`,
`workflow-compiling-service`.
- **CI drift-check (`build.yml`):** after each dist is built and
unzipped, a new step regenerates that module's `NOTICE-binary` and diffs
it against the committed file, failing the build with a one-line fix-up
command on any drift. The amber check runs in the scala job; the five
platform services are each checked in the per-service `platform` matrix
job, alongside the existing `LICENSE-binary` check.

`LICENSE-binary` stays hand-maintained (it needs human judgment on each
license); only `NOTICE-binary` — a mechanical carry-forward of upstream
notices — is generated. So future dep bumps fail CI with the exact
command to regenerate, instead of silently drifting.

### Any related issues, documentation, discussions?

Closes #4674

ASF guidance: https://infra.apache.org/licensing-howto.html (Apache-2.0
§4(d)).

### How was this PR tested?

- Built all six module dists locally (`sbt <project>/Universal/stage`)
and ran the generator against each freshly-built `lib/`; the committed
`NOTICE-binary` files are byte-identical to the generator output, so the
new CI drift-check passes for every module.
- Verified the existing `LICENSE-binary` checks (`check_binary_deps.py`,
PR mode) still pass against the same libs for all six modules.
- `build.yml` validated as well-formed YAML.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)

---------

Co-authored-by: Bob Bai <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/27273904225

With regards,
GitHub Actions via GitBox

Reply via email to