asf-tooling commented on issue #1021:
URL:
https://github.com/apache/tooling-trusted-releases/issues/1021#issuecomment-4409914355
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `bug_fix` • **Classification:** `actionable` • **Confidence:**
`high`
**Application domain(s):** `artifact_upload`, `automated_checks`,
`shared_infrastructure`
### Summary
The issue correctly identifies that DISALLOWED_FILENAMES/DISALLOWED_SUFFIXES
checking only happens asynchronously in the background task
`atr/tasks/checks/paths.py`, allowing forbidden files to persist in storage
until flagged. The `atr/detection.py` module already has `validate_directory()`
called synchronously in `create_revision_with_quarantine()` before any revision
is committed — adding disallowed filename checks there provides
defense-in-depth. Additionally, upload-time rejection in `atr/post/upload.py`
would give immediate user feedback and prevent files from even being staged
temporarily.
### Where this lives in the code today
#### `atr/detection.py` — `validate_directory` (lines 112-122)
_needs modification_
Already called synchronously before revision commit in
create_revision_with_quarantine; adding disallowed filename checks here
provides defense-in-depth.
```python
def validate_directory(directory: pathlib.Path) -> list[str]:
# TODO: Report errors using the whole relative path, not just the
filename
errors: list[str] = []
for path in directory.rglob("*"):
if path.is_symlink():
errors.append(f"{path.name}: Symbolic links are not allowed")
continue
if path.is_file():
if error := _validate_file(path):
errors.append(error)
return errors
```
#### `atr/post/upload.py` — `_add_files` (lines 55-70)
_needs modification_
Upload handler where filename validation should be added before files are
passed to storage.
```python
async def _add_files(
session: web.Committer,
add_form: shared.upload.AddFilesForm,
project_key: safe.ProjectKey,
version_key: safe.VersionKey,
*,
wants_json: bool,
) -> tuple[web.QuartResponse, int] | web.WerkzeugResponse:
try:
file_data = add_form.file_data
async with storage.write(session) as write:
wacp = await write.as_project_committee_participant(project_key)
creation_error, number_of_files, was_quarantined = await
wacp.release.upload_files(
project_key, version_key, file_data
)
```
### Where new code would go
- `atr/detection.py` — after symbol validate_directory
Add a standalone filename validation function that can be used at upload
time and is also called within validate_directory.
- `atr/post/upload.py` — after symbol _add_files (at start of try block)
Validate filenames in file_data before passing to storage writer.
### Proposed approach
The fix has two parts. First, add disallowed filename/suffix checking to
`detection.validate_directory()` which is already called synchronously in
`create_revision_with_quarantine()` before any revision is committed. This
provides strong defense-in-depth — even if an upload path is missed, files
won't be finalized. Second, add early validation in `_add_files()` in
`atr/post/upload.py` to reject disallowed filenames before they even reach the
storage layer, providing immediate user feedback.
A new helper function `validate_filename()` in `atr/detection.py`
encapsulates the check logic (importing from `atr.analysis`), making it
reusable from both `validate_directory` (synchronous, for directories already
on disk) and from the upload handler (for individual filenames before staging).
The existing background check in `atr/tasks/checks/paths.py` can remain as an
additional safety net for files that arrive through other paths (e.g., SVN
import, quarantine extraction).
### Suggested patches
#### `atr/detection.py`
Add disallowed filename validation to validate_directory and expose a
reusable validate_filename function.
````diff
--- a/atr/detection.py
+++ b/atr/detection.py
@@ -18,6 +18,7 @@
import pathlib
from typing import Final
import puremagic
+import atr.analysis as analysis
import atr.models.attestable as models
import atr.models.safe as safe
@@ -93,6 +94,26 @@
return quarantine_paths
+def validate_filename(filename: str) -> str | None:
+ """Validate a single filename against disallowed patterns.
+
+ Returns an error message if the filename is disallowed, or None if it's
acceptable.
+ """
+ if filename in analysis.DISALLOWED_FILENAMES:
+ return f"{filename}: Disallowed filename"
+ for suffix in analysis.DISALLOWED_SUFFIXES:
+ if filename.endswith(suffix):
+ return f"{filename}: Disallowed file extension '{suffix}'"
+ return None
+
+
def validate_directory(directory: pathlib.Path) -> list[str]:
# TODO: Report errors using the whole relative path, not just the
filename
errors: list[str] = []
@@ -101,6 +122,9 @@
errors.append(f"{path.name}: Symbolic links are not allowed")
continue
if path.is_file():
+ if filename_error := validate_filename(path.name):
+ errors.append(filename_error)
+ continue
if error := _validate_file(path):
errors.append(error)
return errors
````
#### `atr/post/upload.py`
Add upload-time filename validation before files are passed to the storage
layer for immediate user feedback.
````diff
--- a/atr/post/upload.py
+++ b/atr/post/upload.py
@@ -22,6 +22,7 @@
import atr.blueprints.post as post
import atr.db as db
+import atr.detection as detection
import atr.errors as errors
import atr.get as get
import atr.log as log
@@ -57,6 +58,17 @@
try:
file_data = add_form.file_data
+ # Validate filenames before passing to storage
+ for file_item in file_data:
+ filename = getattr(file_item, "filename", None) or ""
+ if error := detection.validate_filename(filename):
+ if wants_json:
+ return quart.jsonify(ok=False, message=error), 400
+ await quart.flash(error, "error")
+ return await session.redirect(
+ get.upload.selected,
+ project_key=project_key,
+ version_key=version_key,
+ )
+
async with storage.write(session) as write:
wacp = await write.as_project_committee_participant(project_key)
creation_error, number_of_files, was_quarantined = await
wacp.release.upload_files(
````
#### `tests/unit/test_upload_disallowed.py`
New test file verifying upload-time rejection of disallowed filenames.
````diff
--- /dev/null
+++ b/tests/unit/test_upload_disallowed.py
@@ -0,0 +1,62 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pathlib
+import tempfile
+
+import pytest
+
+import atr.detection as detection
+
+
+class TestValidateFilename:
+ def test_disallowed_filename_rejected(self):
+ assert detection.validate_filename("id_rsa") is not None
+ assert detection.validate_filename("id_ed25519") is not None
+ assert detection.validate_filename(".htaccess") is None # not in
DISALLOWED_FILENAMES
+ assert detection.validate_filename("KEYS") is not None
+ assert detection.validate_filename("desktop.ini") is not None
+ assert detection.validate_filename("Thumbs.db") is not None
+
+ def test_disallowed_suffix_rejected(self):
+ assert detection.validate_filename("server.key") is not None
+ assert detection.validate_filename("private.key") is not None
+
+ def test_allowed_filename_passes(self):
+ assert detection.validate_filename("apache-project-1.0.tar.gz") is
None
+ assert detection.validate_filename("README.md") is None
+ assert detection.validate_filename("LICENSE") is None
+
+
+class TestValidateDirectoryDisallowedFiles:
+ def test_directory_with_disallowed_file(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ tmp_path = pathlib.Path(tmp)
+ (tmp_path / "id_rsa").write_text("fake key")
+ (tmp_path / "good_file.txt").write_text("hello")
+ errors = detection.validate_directory(tmp_path)
+ assert any("id_rsa" in e for e in errors)
+
+ def test_directory_with_disallowed_suffix(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ tmp_path = pathlib.Path(tmp)
+ (tmp_path / "server.key").write_text("fake key")
+ errors = detection.validate_directory(tmp_path)
+ assert any(".key" in e for e in errors)
+
+ def test_directory_all_clean(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ tmp_path = pathlib.Path(tmp)
+ (tmp_path /
"apache-project-1.0.tar.gz").write_bytes(b"\x1f\x8b" + b"\x00" * 100)
+ errors = detection.validate_directory(tmp_path)
+ # Should not have disallowed filename errors (may have content
mismatch)
+ assert not any("Disallowed" in e for e in errors)
````
### Open questions
- What is the exact type/structure of `add_form.file_data`? The upload-time
diff assumes it's iterable with items having a `.filename` attribute (standard
Werkzeug FileStorage), but this needs confirmation from `atr/shared/upload.py`.
- Should the SVN import path (`_svn_import`) also validate filenames? It
imports from a trusted SVN repository but the defense-in-depth in
`validate_directory` covers it since it goes through
`create_revision_with_quarantine`.
- Should `.htaccess` be added to DISALLOWED_FILENAMES? The issue mentions it
but it's not currently in the frozenset (though `is_skippable` in analysis.py
handles it differently).
- The `detection.validate_directory` check happens after files are written
to a temp dir but BEFORE they're committed to the revision directory. The issue
says 'files never reach storage' — does this temp-dir-then-validate pattern
satisfy that criterion or must validation happen before any disk write?
### Files examined
- `atr/analysis.py`
- `atr/post/upload.py`
- `atr/tasks/checks/paths.py`
- `atr/storage/writers/revision.py`
- `tests/unit/test_upload_json.py`
- `atr/detection.py`
- `atr/archives.py`
- `atr/storage/__init__.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]