bobbai00 opened a new pull request, #4693:
URL: https://github.com/apache/texera/pull/4693
### What changes were proposed in this PR?
This PR relaxes the per-PR license-binary check so transitive-only version
bumps no longer block unrelated PRs, while still enforcing the parts of the
check that need legal review. Sub-task implementation for #4691; broader
context in #4688.
**Script (`bin/licensing/check_binary_deps.py`).**
- New flag `--ignore-transitive-version`. Without it, behavior is unchanged
(exact match).
- Added direct-dependency loaders per ecosystem, reading the primary
requirement files:
- `python` → `amber/requirements.txt` (PEP 503 canonical names)
- `npm` → `frontend/package.json` (`dependencies` + `devDependencies` +
`peer*` + `optional*`)
- `agent-npm` → `agent-service/package.json`
- `jar` → every `*.sbt` and `Dependencies.scala` in the repo
- Refactored the diff to surface four classes instead of just two:
- `added` (new package not claimed) — **always fails**
- `stale` (claimed but no longer bundled) — **always fails**
- `drift_direct` (claimed direct dep, version changed) — **always fails**
(a version bump can carry a license change)
- `drift_transitive` (claimed transitive dep, version changed) — **fails
by default; informational with `--ignore-transitive-version`**
- For jars, the script bridges sbt-native-packager's
`<groupId>.<artifactId>-<version>.jar` naming and SBT's bare artifactId by
matching the trailing artifact segment after the last `.`, with Scala-version
suffix stripping for `%%`/`%%%` libs.
**CI (`.github/workflows/build.yml`).**
- All four `check_binary_deps.py` invocations (frontend npm, jar, python,
agent-npm) now pass `--ignore-transitive-version`.
The exact-match check is preserved as the default and will be reused by the
planned nightly job (sub-task #4692) so transitive drift remains visible and
actionable on `main`.
### Any related issues, documentation, discussions?
Resolves #4691. Sibling sub-task: #4692 (nightly exact-match job). Original
report: #4688.
### How was this PR tested?
End-to-end smoke tests run locally against the real combined LICENSE-binary
built via `concat_license_binary.py` (113 python claims, 112 npm claims, 566
jar claims). For each ecosystem the four behavior modes were exercised by
mutating a synthetic `pip-licenses.csv` / `3rdpartylicenses.json` / `lib/`
directory:
| Scenario | Strict (default) | `--ignore-transitive-version` |
| --- | --- | --- |
| Clean run | exit 0 | exit 0 |
| Transitive version drift only | exit 1 | exit 0 (informational) |
| Direct version drift | exit 1 | exit 1 |
| Stale claim (package missing) | exit 1 | exit 1 |
| Added package | exit 1 | exit 1 |
Direct-dep loaders verified against the real repo: 33 python direct deps,
115 frontend npm, 15 agent-service npm, 101 SBT artifactIds — sanity-checked
that known direct deps (`numpy`, `@angular/core`, `@ai-sdk/openai`,
`netty-all`, `jersey-common`) classify as direct and that the existing
per-module LICENSE-binary files split into a non-empty direct subset and a
non-empty transitive subset for each ecosystem (jar: 105 direct / 461
transitive; npm: 53 / 59).
A reproducer is included in the issue thread; happy to add a regression test
under a new `bin/licensing/tests/` directory if reviewers prefer that over the
inline smoke test.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]