bobbai00 opened a new pull request, #4451:
URL: https://github.com/apache/texera/pull/4451

   ### What changes were proposed in this PR?
   
   Add a CI workflow and two helper scripts that keep `LICENSE-binary` and 
`NOTICE-binary` honest against the actually-bundled dependencies, across JVM 
jars, npm packages, and Python packages.
   
   - **`.github/workflows/check-binary-licenses.yml`** — runs on PRs touching 
`**/build.sbt`, `project/plugins.sbt`, `project/AddMetaInfLicenseFiles.scala`, 
`frontend/package.json`, `frontend/yarn.lock`, `amber/requirements.txt`, 
`amber/operator-requirements.txt`, `LICENSE-binary`, `NOTICE-binary`, 
`bin/licensing/**`, or the workflow itself. Three jobs:
     - **check-jvm-deps**: `sbt dist` for every dist-producing module, unzip 
the `lib/` directories, run the JVM checker.
     - **check-npm-deps**: run the frontend production build (emits 
`3rdpartylicenses.txt`), run the npm checker.
     - **check-python-deps**: install the Python requirements, run 
`pip-licenses` to produce a CSV, run the Python checker.
   
   - **`bin/licensing/check_binary_deps.py`** — the checker. Parses bullets 
inside `LICENSE-binary` per ecosystem and reports **ADDED** (bundled but not 
claimed) and **STALE** (claimed but no longer bundled) with remediation hints. 
Categorization correctness is left to manual review.
   
   - **`bin/licensing/collect_binary_licenses.sh`** — maintainer helper. 
Enumerates the currently-bundled dependencies across the three ecosystems to 
seed or refresh `LICENSE-binary`.
   
   ### Any related issues, documentation, discussions?
   
   Closes #4450. Related to #4387 (LICENSE-binary / NOTICE-binary content) and 
#4449 (dist-zip packaging).
   
   **Sequencing note**: this PR depends on `LICENSE-binary` and `NOTICE-binary` 
being present at the repo root, which #4387 adds. Merge after #4387 lands; the 
workflow will otherwise fail with a clear error from the checker.
   
   Follow-up (out of scope here): teach `check_binary_deps.py` to skip jars 
whose groupId starts with `org.apache.texera.` so Texera's own jars are not 
flagged as third-party deps. That change lands once #4447 (groupId rename) 
merges.
   
   ### How was this PR tested?
   
   The checker was exercised against the reviewer spreadsheet that informed 
`LICENSE-binary`'s content, confirming it reports the expected ADDED / STALE 
sets for deliberate mismatches.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to