bobbai00 opened a new pull request, #5417:
URL: https://github.com/apache/texera/pull/5417

   ### What changes were proposed in this PR?
   
   Auto-generates each module's `NOTICE-binary` from the third-party 
`META-INF/NOTICE` files in its bundled jars — replacing the hand-curated 
subsets introduced in #4668 — and adds a CI drift-check so the committed files 
can never silently rot when dependencies change.
   
   - **New generator — `bin/licensing/generate_notice_binary.py`:** walks a 
module's dist `lib/` dir, extracts every `META-INF/NOTICE` (and root-level 
`NOTICE`) from each bundled jar, skips first-party `org.apache.texera.*` jars, 
dedupes by content hash so jars sharing an upstream notice collapse into one 
block, prepends the project's own root `NOTICE`, and emits one block per unique 
notice with a synthesized heading + the contributing-jar list. Output is 
deterministic (CRLF→LF normalized, stably sorted by jar-count). An optional 
`--extras <file>` appends non-jar attributions.
   - **`amber/NOTICE-binary-extras` (new):** the aiohttp + Matplotlib notices, 
which ship as Python wheels (not jars) and so can't be extracted from the 
`lib/` dir.
   - **6 per-module `NOTICE-binary` files regenerated** from the actual bundled 
jars: `amber`, `access-control-service`, `config-service`, `file-service`, 
`computing-unit-managing-service`, `workflow-compiling-service`.
   - **CI drift-check (`build.yml`):** after each dist is built and unzipped, a 
new step regenerates that module's `NOTICE-binary` and diffs it against the 
committed file, failing the build with a one-line fix-up command on any drift. 
The amber check runs in the scala job; the five platform services are each 
checked in the per-service `platform` matrix job, alongside the existing 
`LICENSE-binary` check.
   
   `LICENSE-binary` stays hand-maintained (it needs human judgment on each 
license); only `NOTICE-binary` — a mechanical carry-forward of upstream notices 
— is generated. So future dep bumps fail CI with the exact command to 
regenerate, instead of silently drifting.
   
   ### Any related issues, documentation, discussions?
   
   Closes #4674
   
   Builds on #4668 (already merged). Slated for the v1.2 milestone, per the 
issue discussion. ASF guidance: https://infra.apache.org/licensing-howto.html 
(Apache-2.0 §4(d)).
   
   ### How was this PR tested?
   
   - Built all six module dists locally (`sbt <project>/Universal/stage`) and 
ran the generator against each freshly-built `lib/`; the committed 
`NOTICE-binary` files are byte-identical to the generator output, so the new CI 
drift-check passes for every module.
   - Verified the existing `LICENSE-binary` checks (`check_binary_deps.py`, PR 
mode) still pass against the same libs for all six modules.
   - `build.yml` validated as well-formed YAML.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.8 (1M context)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to