asf-tooling commented on issue #878:
URL:
https://github.com/apache/tooling-trusted-releases/issues/878#issuecomment-4407534127
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@751c2146`
**Type:** `new_feature` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `automated_checks`
### Summary
The issue requests that the lightweight license header checker support
specifying file exclusions per archive rather than globally across all source
archives. Currently, `policy_source_excludes_lightweight` is a project-level
list applied to all archives. @sbp noted in the discussion that ATR's checker
correctly only checks source archives for the ASF-specific header, but vendored
files (like those in maven-mvnd donated through IP clearance) need per-archive
exclusion capability. The issue is explicitly marked as 'not a priority
requirement because RAT mode is available'. The existing path construction in
`_headers_check_core_logic` already includes the artifact basename, so
archive-specific glob patterns could theoretically work already, but there's no
explicit UI or data model support for per-archive exclusion mappings.
### Where this lives in the code today
#### `atr/tasks/checks/license.py` — `headers` (lines 180-213)
_needs modification_
This is where ignore_lines are fetched from project policy globally. To
support per-archive exclusions, this function would need to look up exclusions
specific to the current artifact.
```python
async def headers(args: checks.FunctionArguments) -> results.Results | None:
"""Check that all source files have valid license headers."""
recorder = await args.recorder(CHECK_VERSION_HEADERS)
if not (artifact_abs_path := await recorder.abs_path()):
return None
is_source = await recorder.primary_path_is_source()
if is_source:
project = await recorder.project()
if project.policy_license_check_mode == sql.LicenseCheckMode.RAT:
return None
archive_dir = await checks.resolve_archive_dir(args)
if archive_dir is None:
await recorder.failure(
"Extracted archive tree is not available",
{"rel_path": args.primary_rel_path},
)
return None
log.info(f"Checking license headers for {artifact_abs_path} (rel:
{args.primary_rel_path})")
project = await recorder.project()
ignore_lines: list[str] = []
excludes_source: str
if is_source:
ignore_lines = project.policy_source_excludes_lightweight
excludes_source = "policy" if ignore_lines else "none"
else:
excludes_source = "none"
artifact_basename = os.path.basename(str(artifact_abs_path))
return await _headers_core(recorder, archive_dir, artifact_basename,
ignore_lines, excludes_source)
```
#### `atr/storage/readers/checks.py` — `GeneralPublic.ignores_matcher`
(lines 88-103)
_currently does this_
The existing post-hoc ignore system already supports per-path patterns via
primary_rel_path_glob and member_rel_path_glob, offering a partial workaround
for per-archive filtering of results after checks run.
```python
async def ignores_matcher(
self,
project_key: safe.ProjectKey,
) -> Callable[[sql.CheckResult], bool]:
ignores = await self.__data.check_result_ignore(
project_key=str(project_key),
).all()
def match(cr: sql.CheckResult) -> bool:
for ignore in ignores:
if self.__check_ignore_match(cr, ignore):
# log.info(f"Ignoring check result {cr} due to ignore
{ignore}")
return True
return False
return match
```
#### `atr/shared/ignores.py` — `AddIgnoreForm` (lines 44-55)
_currently does this_
The post-hoc ignore form already supports primary_rel_path_glob which
includes the archive filename, offering per-archive result filtering as a
workaround.
```python
class AddIgnoreForm(form.Form):
variant: ADD = form.value(ADD)
release_glob: str = form.label("Release pattern", default="")
revision_number: safe.OptionalRevisionNumber = form.label("Revision
number (literal)", default="")
checker_glob: str = form.label("Checker pattern", default="")
primary_rel_path_glob: str = form.label("Primary rel path pattern",
default="")
member_rel_path_glob: str = form.label("Member rel path pattern",
default="")
status: form.Enum[IgnoreStatus] = form.label(
"Status",
widget=form.Widget.SELECT,
)
message_glob: str = form.label("Message pattern", default="")
```
### Where new code would go
- `atr/models/sql.py` — after policy_source_excludes_lightweight field in
project model
A new field (e.g., policy_source_excludes_lightweight_per_archive as a
JSON dict mapping archive glob patterns to exclusion lists) would be needed in
the Project model.
- `atr/tasks/checks/license.py` — after line where ignore_lines is assigned
from project.policy_source_excludes_lightweight
Logic to look up per-archive exclusions based on the artifact basename and
merge them with global exclusions.
### Proposed approach
There are two approaches to solve this:
1. **Document existing capability**: The `_headers_check_core_logic` already
prepends the artifact basename to paths before matching (constructing
`/archive-name/member/path`). Users could already write patterns like
`/maven-mvnd-1.0.4-src*/build/publish-on-homebrew.sh` in
`source_excludes_lightweight` to target specific archives. The issue may partly
be that this isn't documented or clear in the UI.
2. **Add explicit per-archive exclusions**: Add a new project policy field
(e.g., `policy_source_excludes_lightweight_per_archive`) that maps archive name
patterns to lists of path exclusions. In the `headers` function, after fetching
global `ignore_lines`, also look up any per-archive exclusions matching the
current `artifact_basename` and merge them. This would require a new field in
the SQL model, a form to manage it in the release policy UI, and updated cache
invalidation keys.
Given the issue is explicitly 'not a priority' and the existing mechanism
already supports archive-specific patterns (since paths include the basename),
the most pragmatic first step would be documenting the existing capability. If
that's insufficient, the explicit per-archive field would be the proper feature
addition.
### Open questions
- Does `util.create_path_matcher` support globbing that includes the archive
basename (e.g., `/maven-mvnd-*-src*/build/*.sh`)? If so, the current system may
already support per-archive exclusions with documentation.
- What is the exact structure of the Project model's
`policy_source_excludes_lightweight` field? Is it a simple list[str] or does it
already have any structure that could accommodate per-archive mappings?
- Should the solution be a pre-check exclusion (preventing the check from
running on certain files) or is the post-hoc ignore system (CheckResultIgnore
with primary_rel_path_glob) sufficient as a workaround?
- The issue notes 'not a priority requirement because RAT mode is available'
- should this be deferred until there's more demand?
### Files examined
- `atr/tasks/checks/license.py`
- `atr/shared/ignores.py`
- `atr/storage/readers/checks.py`
- `atr/storage/writers/checks.py`
- `atr/tasks/checks/rat.py`
- `atr/get/checks.py`
- `atr/post/ignores.py`
- `atr/get/ignores.py`
### Related issues
This issue appears related to: #1214.
_Both concern overly strict license/file checks that should be configurable
or skipped for certain file types_
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]