asf-tooling commented on issue #1021:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1021#issuecomment-4409914355

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `bug_fix`  •  **Classification:** `actionable`  •  **Confidence:** 
`high`
   **Application domain(s):** `artifact_upload`, `automated_checks`, 
`shared_infrastructure`
   
   ### Summary
   The issue correctly identifies that DISALLOWED_FILENAMES/DISALLOWED_SUFFIXES 
checking only happens asynchronously in the background task 
`atr/tasks/checks/paths.py`, allowing forbidden files to persist in storage 
until flagged. The `atr/detection.py` module already has `validate_directory()` 
called synchronously in `create_revision_with_quarantine()` before any revision 
is committed — adding disallowed filename checks there provides 
defense-in-depth. Additionally, upload-time rejection in `atr/post/upload.py` 
would give immediate user feedback and prevent files from even being staged 
temporarily.
   
   ### Where this lives in the code today
   
   #### `atr/detection.py` — `validate_directory` (lines 112-122)
   _needs modification_
   Already called synchronously before revision commit in 
create_revision_with_quarantine; adding disallowed filename checks here 
provides defense-in-depth.
   
   ```python
   def validate_directory(directory: pathlib.Path) -> list[str]:
       # TODO: Report errors using the whole relative path, not just the 
filename
       errors: list[str] = []
       for path in directory.rglob("*"):
           if path.is_symlink():
               errors.append(f"{path.name}: Symbolic links are not allowed")
               continue
           if path.is_file():
               if error := _validate_file(path):
                   errors.append(error)
       return errors
   ```
   
   #### `atr/post/upload.py` — `_add_files` (lines 55-70)
   _needs modification_
   Upload handler where filename validation should be added before files are 
passed to storage.
   
   ```python
   async def _add_files(
       session: web.Committer,
       add_form: shared.upload.AddFilesForm,
       project_key: safe.ProjectKey,
       version_key: safe.VersionKey,
       *,
       wants_json: bool,
   ) -> tuple[web.QuartResponse, int] | web.WerkzeugResponse:
       try:
           file_data = add_form.file_data
   
           async with storage.write(session) as write:
               wacp = await write.as_project_committee_participant(project_key)
               creation_error, number_of_files, was_quarantined = await 
wacp.release.upload_files(
                   project_key, version_key, file_data
               )
   ```
   
   ### Where new code would go
   - `atr/detection.py` — after symbol validate_directory
     Add a standalone filename validation function that can be used at upload 
time and is also called within validate_directory.
   - `atr/post/upload.py` — after symbol _add_files (at start of try block)
     Validate filenames in file_data before passing to storage writer.
   
   ### Proposed approach
   The fix has two parts. First, add disallowed filename/suffix checking to 
`detection.validate_directory()` which is already called synchronously in 
`create_revision_with_quarantine()` before any revision is committed. This 
provides strong defense-in-depth — even if an upload path is missed, files 
won't be finalized. Second, add early validation in `_add_files()` in 
`atr/post/upload.py` to reject disallowed filenames before they even reach the 
storage layer, providing immediate user feedback.
   
   A new helper function `validate_filename()` in `atr/detection.py` 
encapsulates the check logic (importing from `atr.analysis`), making it 
reusable from both `validate_directory` (synchronous, for directories already 
on disk) and from the upload handler (for individual filenames before staging). 
The existing background check in `atr/tasks/checks/paths.py` can remain as an 
additional safety net for files that arrive through other paths (e.g., SVN 
import, quarantine extraction).
   
   ### Suggested patches
   
   #### `atr/detection.py`
   Add disallowed filename validation to validate_directory and expose a 
reusable validate_filename function.
   
   ````diff
   --- a/atr/detection.py
   +++ b/atr/detection.py
   @@ -18,6 +18,7 @@
    import pathlib
    from typing import Final
    
    import puremagic
    
   +import atr.analysis as analysis
    import atr.models.attestable as models
    import atr.models.safe as safe
    
   @@ -93,6 +94,26 @@
        return quarantine_paths
    
    
   +def validate_filename(filename: str) -> str | None:
   +    """Validate a single filename against disallowed patterns.
   +
   +    Returns an error message if the filename is disallowed, or None if it's 
acceptable.
   +    """
   +    if filename in analysis.DISALLOWED_FILENAMES:
   +        return f"{filename}: Disallowed filename"
   +    for suffix in analysis.DISALLOWED_SUFFIXES:
   +        if filename.endswith(suffix):
   +            return f"{filename}: Disallowed file extension '{suffix}'"
   +    return None
   +
   +
    def validate_directory(directory: pathlib.Path) -> list[str]:
        # TODO: Report errors using the whole relative path, not just the 
filename
        errors: list[str] = []
   @@ -101,6 +122,9 @@
                errors.append(f"{path.name}: Symbolic links are not allowed")
                continue
            if path.is_file():
   +            if filename_error := validate_filename(path.name):
   +                errors.append(filename_error)
   +                continue
                if error := _validate_file(path):
                    errors.append(error)
        return errors
   ````
   
   #### `atr/post/upload.py`
   Add upload-time filename validation before files are passed to the storage 
layer for immediate user feedback.
   
   ````diff
   --- a/atr/post/upload.py
   +++ b/atr/post/upload.py
   @@ -22,6 +22,7 @@
    import atr.blueprints.post as post
    import atr.db as db
   +import atr.detection as detection
    import atr.errors as errors
    import atr.get as get
    import atr.log as log
   @@ -57,6 +58,17 @@
        try:
            file_data = add_form.file_data
    
   +        # Validate filenames before passing to storage
   +        for file_item in file_data:
   +            filename = getattr(file_item, "filename", None) or ""
   +            if error := detection.validate_filename(filename):
   +                if wants_json:
   +                    return quart.jsonify(ok=False, message=error), 400
   +                await quart.flash(error, "error")
   +                return await session.redirect(
   +                    get.upload.selected,
   +                    project_key=project_key,
   +                    version_key=version_key,
   +                )
   +
            async with storage.write(session) as write:
                wacp = await write.as_project_committee_participant(project_key)
                creation_error, number_of_files, was_quarantined = await 
wacp.release.upload_files(
   ````
   
   #### `tests/unit/test_upload_disallowed.py`
   New test file verifying upload-time rejection of disallowed filenames.
   
   ````diff
   --- /dev/null
   +++ b/tests/unit/test_upload_disallowed.py
   @@ -0,0 +1,62 @@
   +# Licensed to the Apache Software Foundation (ASF) under one
   +# or more contributor license agreements.  See the NOTICE file
   +# distributed with this work for additional information
   +# regarding copyright ownership.  The ASF licenses this file
   +# to you under the Apache License, Version 2.0 (the
   +# "License"); you may not use this file except in compliance
   +# with the License.  You may obtain a copy of the License at
   +#
   +#   http://www.apache.org/licenses/LICENSE-2.0
   +#
   +# Unless required by applicable law or agreed to in writing,
   +# software distributed under the License is distributed on an
   +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   +# KIND, either express or implied.  See the License for the
   +# specific language governing permissions and limitations
   +# under the License.
   +
   +import pathlib
   +import tempfile
   +
   +import pytest
   +
   +import atr.detection as detection
   +
   +
   +class TestValidateFilename:
   +    def test_disallowed_filename_rejected(self):
   +        assert detection.validate_filename("id_rsa") is not None
   +        assert detection.validate_filename("id_ed25519") is not None
   +        assert detection.validate_filename(".htaccess") is None  # not in 
DISALLOWED_FILENAMES
   +        assert detection.validate_filename("KEYS") is not None
   +        assert detection.validate_filename("desktop.ini") is not None
   +        assert detection.validate_filename("Thumbs.db") is not None
   +
   +    def test_disallowed_suffix_rejected(self):
   +        assert detection.validate_filename("server.key") is not None
   +        assert detection.validate_filename("private.key") is not None
   +
   +    def test_allowed_filename_passes(self):
   +        assert detection.validate_filename("apache-project-1.0.tar.gz") is 
None
   +        assert detection.validate_filename("README.md") is None
   +        assert detection.validate_filename("LICENSE") is None
   +
   +
   +class TestValidateDirectoryDisallowedFiles:
   +    def test_directory_with_disallowed_file(self):
   +        with tempfile.TemporaryDirectory() as tmp:
   +            tmp_path = pathlib.Path(tmp)
   +            (tmp_path / "id_rsa").write_text("fake key")
   +            (tmp_path / "good_file.txt").write_text("hello")
   +            errors = detection.validate_directory(tmp_path)
   +            assert any("id_rsa" in e for e in errors)
   +
   +    def test_directory_with_disallowed_suffix(self):
   +        with tempfile.TemporaryDirectory() as tmp:
   +            tmp_path = pathlib.Path(tmp)
   +            (tmp_path / "server.key").write_text("fake key")
   +            errors = detection.validate_directory(tmp_path)
   +            assert any(".key" in e for e in errors)
   +
   +    def test_directory_all_clean(self):
   +        with tempfile.TemporaryDirectory() as tmp:
   +            tmp_path = pathlib.Path(tmp)
   +            (tmp_path / 
"apache-project-1.0.tar.gz").write_bytes(b"\x1f\x8b" + b"\x00" * 100)
   +            errors = detection.validate_directory(tmp_path)
   +            # Should not have disallowed filename errors (may have content 
mismatch)
   +            assert not any("Disallowed" in e for e in errors)
   ````
   
   ### Open questions
   - What is the exact type/structure of `add_form.file_data`? The upload-time 
diff assumes it's iterable with items having a `.filename` attribute (standard 
Werkzeug FileStorage), but this needs confirmation from `atr/shared/upload.py`.
   - Should the SVN import path (`_svn_import`) also validate filenames? It 
imports from a trusted SVN repository but the defense-in-depth in 
`validate_directory` covers it since it goes through 
`create_revision_with_quarantine`.
   - Should `.htaccess` be added to DISALLOWED_FILENAMES? The issue mentions it 
but it's not currently in the frozenset (though `is_skippable` in analysis.py 
handles it differently).
   - The `detection.validate_directory` check happens after files are written 
to a temp dir but BEFORE they're committed to the revision directory. The issue 
says 'files never reach storage' — does this temp-dir-then-validate pattern 
satisfy that criterion or must validation happen before any disk write?
   
   ### Files examined
   - `atr/analysis.py`
   - `atr/post/upload.py`
   - `atr/tasks/checks/paths.py`
   - `atr/storage/writers/revision.py`
   - `tests/unit/test_upload_json.py`
   - `atr/detection.py`
   - `atr/archives.py`
   - `atr/storage/__init__.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to