asf-tooling commented on issue #1136:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1136#issuecomment-4407485565

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@751c2146`
   
   **Type:** `refactor`  •  **Classification:** `actionable`  •  
**Confidence:** `high`
   **Application domain(s):** `infrastructure`, `admin_operations`
   
   ### Summary
   The issue requests moving expensive permission-scanning logic from 
synchronous server startup to a background maintenance task. @sbp confirmed 
`_app_dirs_setup` takes ~11 seconds due to recursive permission enforcement on 
archives/unfinished directories. @alitheg made a start by (a) adding the daily 
maintenance task infrastructure (the `_storage_maintenance` stub) and (b) 
deferring a heavy cyclonedx import chain, cutting startup from 15s to 5s. The 
remaining work is to move the actual permission enforcement logic into the 
`_storage_maintenance` function, which currently contains only `pass`.
   
   ### Where this lives in the code today
   
   #### `atr/server.py` — `_app_dirs_setup` (lines 168-207)
   _needs modification_
   This is the main bottleneck (~11s) because it recursively walks and chmod's 
all files in archives and unfinished directories during synchronous app 
creation.
   
   ```python
   def _app_dirs_setup(state_dir_str: str, hot_reload: bool) -> None:
       """Setup application directories."""
       if not os.path.isdir(state_dir_str):
           raise RuntimeError(f"State directory not found: {state_dir_str}")
       os.chdir(state_dir_str)
       if hot_reload is False:
           print(f"Working directory changed to: {os.getcwd()}")
   
       # Note that the hypercorn directories are not managed by ATR
       directories_to_ensure = [
           pathlib.Path(state_dir_str) / "audit",
           pathlib.Path(state_dir_str) / "cache",
           pathlib.Path(state_dir_str) / "database",
           pathlib.Path(state_dir_str) / "hypercorn" / "logs",
           pathlib.Path(state_dir_str) / "hypercorn" / "secrets",
           pathlib.Path(state_dir_str) / "logs",
           pathlib.Path(state_dir_str) / "runtime",
           pathlib.Path(state_dir_str) / "secrets" / "curated",
           pathlib.Path(state_dir_str) / "secrets" / "generated",
           pathlib.Path(paths.get_archives_dir()),
           pathlib.Path(paths.get_downloads_dir()),
           pathlib.Path(paths.get_finished_dir()),
           pathlib.Path(paths.get_quarantined_dir()),
           pathlib.Path(paths.get_tmp_dir()),
           pathlib.Path(paths.get_unfinished_dir()),
       ]
       archives_dir = pathlib.Path(paths.get_archives_dir())
       unfinished_dir = pathlib.Path(paths.get_unfinished_dir())
       enforce_permissions = not config.is_dev_environment()
       for directory in directories_to_ensure:
           directory.mkdir(parents=True, exist_ok=True)
           if not enforce_permissions:
               continue
           # Some directories need custom permissions
           if directory == archives_dir:
               _enforce_archives_permissions(archives_dir)
           elif directory == unfinished_dir:
               _enforce_unfinished_permissions(unfinished_dir)
           else:
               util.chmod_directories(directory, permissions=0o755)
   ```
   
   #### `atr/server.py` — `_enforce_archives_permissions` (lines 711-742)
   _needs modification_
   This function does expensive recursive directory traversal. It should be 
moved to run as a background maintenance task rather than blocking startup.
   
   ```python
   def _enforce_archives_permissions(archives_dir: pathlib.Path) -> None:
       if not archives_dir.exists():
           return
       fixed_files = 0
       fixed_dirs = 0
   
       # Set ancestor directories of archive files to 755
       for dirpath, _, _ in os.walk(archives_dir, topdown=True):
           path = pathlib.Path(dirpath)
           depth = len(path.relative_to(archives_dir).parts)
           if depth < 3:
               os.chmod(path, 0o755)
   
       # Set archive files to 444
       for file_path in archives_dir.rglob("*"):
           if not file_path.is_file():
               continue
           depth = len(file_path.relative_to(archives_dir).parts)
           if (depth >= 3) and (stat.S_IMODE(file_path.stat().st_mode) != 
0o444):
               os.chmod(file_path, 0o444)
               fixed_files += 1
   
       # Set archive directories to 555
       for dirpath, _, _ in os.walk(archives_dir, topdown=False):
           path = pathlib.Path(dirpath)
           depth = len(path.relative_to(archives_dir).parts)
           if (depth >= 3) and (stat.S_IMODE(path.stat().st_mode) != 0o555):
               os.chmod(path, 0o555)
               fixed_dirs += 1
   
       if (fixed_files > 0) or (fixed_dirs > 0):
           log.info(f"Fixed archive permissions: {fixed_files} files to 0o444, 
{fixed_dirs} directories to 0o555")
   ```
   
   #### `atr/server.py` — `_enforce_unfinished_permissions` (lines 745-758)
   _needs modification_
   Another expensive recursive directory traversal that should be moved to the 
maintenance task.
   
   ```python
   def _enforce_unfinished_permissions(unfinished_dir: pathlib.Path) -> None:
       # Set ancestor directories of revisions to 755
       for dirpath, _dirnames, _filenames in os.walk(unfinished_dir, 
topdown=True):
           path = pathlib.Path(dirpath)
           depth = len(path.relative_to(unfinished_dir).parts)
           if depth < 3:
               os.chmod(path, 0o755)
   
       # Set revision directories and their descendants to 555
       for dirpath, _dirnames, _filenames in os.walk(unfinished_dir, 
topdown=False):
           path = pathlib.Path(dirpath)
           depth = len(path.relative_to(unfinished_dir).parts)
           if depth >= 3:
               os.chmod(path, 0o555)
   ```
   
   #### `atr/tasks/maintenance.py` — `_storage_maintenance` (lines 89-90)
   _extension point_
   @alitheg added this stub as the designated location for the deferred 
permission enforcement logic.
   
   ```python
   async def _storage_maintenance():
       pass
   ```
   
   #### `atr/tasks/maintenance.py` — `run` (lines 40-65)
   _currently does this_
   The daily maintenance orchestrator already calls `_storage_maintenance()` — 
it just needs the permission logic filled in.
   
   ```python
   @checks.with_model(args.MaintenanceArgs)
   async def run(task_args: args.MaintenanceArgs) -> results.Results | None:
       """Run maintenance."""
       log.info(f"Starting maintenance (user: {task_args.asf_uid})")
   
       try:
           # We schedule first to keep the start time approximately consistent
           # (and knowing this task will finish before it's re-scheduled)
           await tasks.schedule_next(task_args.asf_uid, 
task_args.next_schedule_seconds, tasks.run_maintenance)
   
           await _expired_pats_maintenance()
           await _session_data_maintenance()
           await _storage_maintenance()
           await _workflow_ssh_keys_maintenance()
   
           log.info(
               "Storage maintenance completed successfully",
           )
   
           return results.Maintenance(
               kind="maintenance",
           )
       except Exception as e:
           error_msg = f"Unexpected error during maintenance: {e!s}"
           log.exception("Maintenance failed with unexpected error")
           raise MaintenanceError(error_msg) from e
   ```
   
   ### Where new code would go
   - `atr/tasks/maintenance.py` — after symbol _storage_maintenance
     The permission enforcement logic needs to be implemented here, replacing 
the `pass` statement.
   
   ### Proposed approach
   The fix has two parts: (1) Remove the expensive permission enforcement from 
`_app_dirs_setup` in production so it no longer blocks startup. Keep only the 
cheap `mkdir` calls. (2) Implement the permission enforcement in 
`_storage_maintenance()` which already runs daily via the maintenance task 
system. The permission functions (`_enforce_archives_permissions` and 
`_enforce_unfinished_permissions`) can be moved from `atr/server.py` to 
`atr/tasks/maintenance.py` (or kept in server.py and imported), and called from 
within `_storage_maintenance` using `asyncio.to_thread` since they are blocking 
I/O operations. The `util.chmod_directories` call for standard directories 
(0o755) is fast and can remain in startup, or also be deferred — the key cost 
is the recursive walks of archives and unfinished dirs.
   
   ### Suggested patches
   
   #### `atr/server.py`
   Remove expensive permission enforcement from startup; keep only directory 
creation.
   
   ````diff
   --- a/atr/server.py
   +++ b/atr/server.py
   @@ -134,16 +134,10 @@
        archives_dir = pathlib.Path(paths.get_archives_dir())
        unfinished_dir = pathlib.Path(paths.get_unfinished_dir())
   -    enforce_permissions = not config.is_dev_environment()
        for directory in directories_to_ensure:
            directory.mkdir(parents=True, exist_ok=True)
   -        if not enforce_permissions:
   -            continue
   -        # Some directories need custom permissions
   -        if directory == archives_dir:
   -            _enforce_archives_permissions(archives_dir)
   -        elif directory == unfinished_dir:
   -            _enforce_unfinished_permissions(unfinished_dir)
   -        else:
   -            util.chmod_directories(directory, permissions=0o755)
   +        # Permission enforcement is now handled by the daily maintenance 
task
   +        # (see atr/tasks/maintenance.py :: _storage_maintenance)
   +        # This avoids blocking server startup for 10+ seconds on large 
state dirs
   ````
   
   #### `atr/tasks/maintenance.py`
   Implement permission enforcement in the daily maintenance task, replacing 
the empty stub.
   
   ````diff
   --- a/atr/tasks/maintenance.py
   +++ b/atr/tasks/maintenance.py
   @@ -16,11 +16,16 @@
    
    import datetime
   +import os
   +import pathlib
   +import stat
    import time
    from typing import Final
    
    import sqlmodel
    
    import atr.db as db
   +import atr.config as config
    import atr.log as log
    import atr.models.args as args
    import atr.models.results as results
    import atr.models.sql as sql
   +import atr.paths as paths
    import atr.tasks as tasks
    import atr.tasks.checks as checks
   +import atr.util as util
    
    _EXPIRED_TOKEN_RETENTION_DAYS: Final[int] = 30
    _FORM_ERROR_TTL_SECONDS: Final[int] = 24 * 60 * 60
   @@ -88,5 +93,62 @@
            await data.commit()
    
    
   -async def _storage_maintenance():
   -    pass
   +async def _storage_maintenance() -> None:
   +    """Enforce file and directory permissions on storage directories."""
   +    if config.is_dev_environment():
   +        return
   +
   +    import asyncio
   +
   +    archives_dir = pathlib.Path(paths.get_archives_dir())
   +    unfinished_dir = pathlib.Path(paths.get_unfinished_dir())
   +
   +    await asyncio.to_thread(_enforce_archives_permissions, archives_dir)
   +    await asyncio.to_thread(_enforce_unfinished_permissions, unfinished_dir)
   +
   +
   +def _enforce_archives_permissions(archives_dir: pathlib.Path) -> None:
   +    if not archives_dir.exists():
   +        return
   +    fixed_files = 0
   +    fixed_dirs = 0
   +
   +    # Set ancestor directories of archive files to 755
   +    for dirpath, _, _ in os.walk(archives_dir, topdown=True):
   +        path = pathlib.Path(dirpath)
   +        depth = len(path.relative_to(archives_dir).parts)
   +        if depth < 3:
   +            os.chmod(path, 0o755)
   +
   +    # Set archive files to 444
   +    for file_path in archives_dir.rglob("*"):
   +        if not file_path.is_file():
   +            continue
   +        depth = len(file_path.relative_to(archives_dir).parts)
   +        if (depth >= 3) and (stat.S_IMODE(file_path.stat().st_mode) != 
0o444):
   +            os.chmod(file_path, 0o444)
   +            fixed_files += 1
   +
   +    # Set archive directories to 555
   +    for dirpath, _, _ in os.walk(archives_dir, topdown=False):
   +        path = pathlib.Path(dirpath)
   +        depth = len(path.relative_to(archives_dir).parts)
   +        if (depth >= 3) and (stat.S_IMODE(path.stat().st_mode) != 0o555):
   +            os.chmod(path, 0o555)
   +            fixed_dirs += 1
   +
   +    if (fixed_files > 0) or (fixed_dirs > 0):
   +        log.info(f"Fixed archive permissions: {fixed_files} files to 0o444, 
{fixed_dirs} directories to 0o555")
   +
   +
   +def _enforce_unfinished_permissions(unfinished_dir: pathlib.Path) -> None:
   +    if not unfinished_dir.exists():
   +        return
   +
   +    # Set ancestor directories of revisions to 755
   +    for dirpath, _dirnames, _filenames in os.walk(unfinished_dir, 
topdown=True):
   +        path = pathlib.Path(dirpath)
   +        depth = len(path.relative_to(unfinished_dir).parts)
   +        if depth < 3:
   +            os.chmod(path, 0o755)
   +
   +    # Set revision directories and their descendants to 555
   +    for dirpath, _dirnames, _filenames in os.walk(unfinished_dir, 
topdown=False):
   +        path = pathlib.Path(dirpath)
   +        depth = len(path.relative_to(unfinished_dir).parts)
   +        if depth >= 3:
   +            os.chmod(path, 0o555)
   ````
   
   ### Open questions
   - Should the permission enforcement also run at startup (once, in 
background) via `before_serving` to ensure security posture immediately, or is 
daily maintenance sufficient?
   - Should `_enforce_archives_permissions` and 
`_enforce_unfinished_permissions` be removed from `atr/server.py` entirely, or 
kept for potential manual/admin invocation?
   - The `util.chmod_directories(directory, permissions=0o755)` call for 
standard directories (audit, cache, logs, etc.) is likely fast — should it 
remain in startup or also be deferred?
   - Is there additional work needed for the cyclonedx import chain deferral 
that @alitheg mentioned, or was that already committed?
   
   ### Files examined
   - `atr/server.py`
   - `atr/manager.py`
   - `atr/worker.py`
   - `scripts/check-perms`
   - `start-atr.sh`
   - `start-dev.sh`
   - `atr/tasks/__init__.py`
   - `atr/tasks/maintenance.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to