bobbai00 opened a new pull request, #4711:
URL: https://github.com/apache/texera/pull/4711
### What changes were proposed in this PR?
Fixes a latent bug in `bin/licensing/check_binary_deps.py` — the internal
indexers used `dict[name, version]` (and `dict[artifact, (version, basename)]`
for jars), so when the same name appeared with two different versions the
second assignment silently overwrote the first. Concretely, on `main` today:
- The combined LICENSE-binary has **97 artifacts claimed at multiple
versions** (e.g. `org.eclipse.jetty.jetty-server` at `9.4.20.v20190813` and
`11.0.20`, `io.grpc.grpc-api` at `1.60.0` and `1.62.2`,
`org.apache.hadoop.hadoop-auth` at `3.3.1` and `3.3.3`, ...).
- `_index_jar` collapses 566 entries → 460 — **106 entries are silently
dropped**, including 6 of the 7 `netty-tcnative-boringssl-static` per-platform
classifiers.
This is undetectable by current CI because #4632 split each ecosystem across
services so any single CI invocation only sees one version per lib. The script
itself is still wrong, and the multi-version claims in the per-module
LICENSE-binary files are not actually being validated against bundled reality.
**Fix.** Switch indexers to multimaps:
- `_index_npm` / `_index_python` → `dict[str, set[str]]` (name → versions)
- `_index_jar` → `dict[str, dict[str, str]]` (artifact → version → basename)
Refactor `diff_simple` / `diff_jars` to emit per-version `added` / `stale`
and a per-name `drift` shaped `(name, sorted_claimed, sorted_real)`. Render
multi-version drift as
```
~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20
bundled=9.4.20.v20190813, 11.0.21
```
falling back to the existing single-version form when there's only one
version on each side. As a side benefit, `added`/`stale` lines now include the
version (this regressed in #4693 which printed bare names).
### Any related issues, documentation, discussions?
Follow-up to #4693 (the original `--ignore-transitive-version` PR) and #4691
(parent task). Discovered and reported by @bobbai00 — without per-service CI
splitting (#4632) this would manifest as spurious passes whenever two services
bundled different versions of the same lib and the LICENSE-binary documented
both.
### How was this PR tested?
Smoke-tested locally against the real combined LICENSE-binary (566 jar
claims, 113 python claims):
| Scenario | Strict | `--ignore-transitive-version` |
| --- | --- | --- |
| Clean run (multi-version preserved) | exit 0 (566 entries indexed, was
460) | exit 0 |
| Drop one of two versions of a transitive artifact | exit 1 (drift) | exit
0 (informational drift, both versions shown) |
| Drop all versions of an artifact | exit 1 (stale) | exit 1 |
| Direct-dep version drift (single version) | exit 1 | exit 1 |
| Transitive-dep version drift (single version) | exit 1 | exit 0 |
Sample multi-version drift output:
```
DRIFT (transitive, informational) JVM jars:
~ org.glassfish.jersey.ext.jersey-metainf-services: LICENSE-binary=2.25.1,
3.0.12 bundled=3.0.12
```
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]