asf-tooling commented on issue #979:
URL:
https://github.com/apache/tooling-trusted-releases/issues/979#issuecomment-4409937627
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `documentation` • **Classification:** `actionable` •
**Confidence:** `high`
**Application domain(s):** `shared_infrastructure`
### Summary
Issue #979 requests creation of a centralized documentation file
`atr/docs/resource-management.md` to satisfy ASVS 15.1.3 requirements. The
application already implements resource controls (upload limits, extraction
limits, task timeouts, worker pool management) but lacks a single document
inventorying all resource-intensive operations and their defenses. The existing
developer guide would need updating to link to this new page. No prior
discussion exists on this issue.
### Where this lives in the code today
#### `atr/models/args.py` — `SvnImport` (lines 175-183)
_currently does this_
SVN import is one of the resource-intensive operations that should be
inventoried in the new doc.
```python
class SvnImport(schema.Strict):
"""Arguments for the task to import files from SVN."""
svn_url: safe.RelPath
revision: str
target_subdirectory: str | None
project_key: safe.ProjectKey
version_key: safe.VersionKey
asf_uid: str
```
### Where new code would go
- `atr/docs/resource-management.md` — new file
The primary deliverable of this issue - centralized resource management
documentation.
### Proposed approach
Create `atr/docs/resource-management.md` as a new documentation page in the
developer guide, following the existing documentation style (numbered sections,
navigation links, markdown). The document should inventory all
resource-intensive operations identified from the codebase (archive extraction,
SBOM generation, signature verification, SVN/rsync operations, etc.), document
the timeout chain architecture (HTTP request → task queue → worker process →
subprocess), list all configurable limits from `config.py`, and provide
monitoring/capacity planning guidance.
The developer guide index (`atr/docs/developer-guide.md`) should be updated
to include the new page as section 3.17. The content should reference actual
configuration values and code modules so it stays grounded in the
implementation.
### Suggested patches
#### `atr/docs/resource-management.md`
Create the centralized resource management documentation as requested by the
issue.
````diff
--- /dev/null
+++ b/atr/docs/resource-management.md
@@ -0,0 +1,198 @@
+# 3.17. Resource management
+
+**Up**: `3.` [Developer guide](developer-guide)
+
+**Prev**: `3.16.` [ASFQuart usage](asfquart-usage)
+
+**Next**: (none)
+
+**Sections**:
+
+* [Introduction](#introduction)
+* [Resource-intensive operations
inventory](#resource-intensive-operations-inventory)
+* [Timeout chain architecture](#timeout-chain-architecture)
+* [Per-user and per-application
limits](#per-user-and-per-application-limits)
+* [Monitoring and alerting](#monitoring-and-alerting)
+* [Capacity planning](#capacity-planning)
+
+## Introduction
+
+ATR performs several resource-intensive operations as part of release
management. This document inventories those operations, describes their timeout
and resource limit chains, and provides guidance for maintaining availability
under load. It satisfies ASVS 15.1.3 requirements for documenting
time-consuming or resource-demanding functionality.
+
+## Resource-intensive operations inventory
+
+The following operations are identified as resource-intensive. Each runs as
a background task via the worker pool (see [Tasks](tasks)) unless otherwise
noted.
+
+| Operation | Task Type | Time Profile | Primary Defense |
+|-----------|-----------|-------------|----------------|
+| Archive extraction (tar.gz) | `TARGZ_INTEGRITY`, `TARGZ_STRUCTURE` |
Seconds to minutes depending on archive size | `MAX_EXTRACT_SIZE` (2 GB),
`EXTRACT_CHUNK_SIZE` (4 MB), worker timeout |
+| Archive extraction (zip) | `ZIPFORMAT_INTEGRITY`, `ZIPFORMAT_STRUCTURE` |
Seconds to minutes depending on archive size | `MAX_EXTRACT_SIZE` (2 GB),
`EXTRACT_CHUNK_SIZE` (4 MB), worker timeout |
+| SBOM generation (CycloneDX) | `GenerateCycloneDX`, `ConvertCycloneDX` |
Minutes for large dependency trees | Worker timeout, subprocess timeout |
+| SBOM vulnerability scanning (OSV) | OSV scan tasks | Seconds to minutes
depending on component count | Worker timeout, network timeout |
+| SBOM quality scoring | `ScoreArgs` tasks | Seconds | Worker timeout |
+| Signature verification (OpenPGP) | `SIGNATURE_CHECK` | Seconds per file |
Worker timeout |
+| Hash verification | `HASHING_CHECK` | Seconds per file, proportional to
file size | Worker timeout |
+| License compliance (file-level) | `LICENSE_FILES` | Seconds to minutes |
Worker timeout |
+| License header scanning | `LICENSE_HEADERS` | Minutes for large archives
| Worker timeout |
+| Apache RAT analysis | `RAT_CHECK` | Minutes for large source trees |
Worker timeout, subprocess timeout |
+| File path validation | `PATHS_CHECK` | Seconds | Worker timeout |
+| SVN import | `SvnImport` | Minutes depending on repository size | Worker
timeout, network timeout |
+| Git clone (Trusted Publishing) | SSH/GitHub operations | Minutes
depending on repository size | SSH session timeout, worker timeout |
+| Rsync transfer | SSH upload pipeline | Minutes depending on transfer size
| SSH session timeout, `UPLOAD_BODY_TIMEOUT` |
+| Email sending | `Send` task | Seconds | Worker timeout, SMTP timeout |
+| Vote initiation | `Initiate` task | Seconds | Worker timeout |
+| GitHub Actions workflow dispatch | `DistributionWorkflow` | Seconds for
dispatch; minutes-hours for completion | Worker timeout, workflow status
polling |
+| Metadata update (LDAP/Whimsy) | `Update` task | Seconds to minutes |
Worker timeout, network timeout |
+| Quarantine validation | `QuarantineValidate` | Seconds to minutes
depending on file count | Worker timeout |
+| Database pagination | HTTP request handlers | Milliseconds to seconds |
Query limits, SQLite WAL mode |
+| File upload processing | HTTP request (synchronous) | Seconds to minutes
| `MAX_CONTENT_LENGTH` (512 MB), `UPLOAD_BODY_TIMEOUT` (3600s) |
+
+## Timeout chain architecture
+
+ATR uses a layered timeout architecture to prevent any single operation
from blocking the system indefinitely:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Layer 1: HTTP / Hypercorn │
+│ - Request timeout for synchronous operations │
+│ - UPLOAD_BODY_TIMEOUT (3600s) for upload streams │
+│ - MAX_CONTENT_LENGTH (512 MB) for request bodies │
+├─────────────────────────────────────────────────────────────┤
+│ Layer 2: Task Queue (SQLite) │
+│ - Tasks queued with QUEUED status │
+│ - No timeout at queue level (tasks wait until claimed) │
+├─────────────────────────────────────────────────────────────┤
+│ Layer 3: Worker Manager │
+│ - Monitors worker process lifetimes │
+│ - Terminates tasks exceeding time limits │
+│ - Replenishes worker pool on exit │
+│ - Checks health every few seconds │
+├─────────────────────────────────────────────────────────────┤
+│ Layer 4: Worker Process │
+│ - Claims and executes tasks sequentially │
+│ - Exits voluntarily after fixed task count │
+│ - Task-specific handler called with appropriate args │
+├─────────────────────────────────────────────────────────────┤
+│ Layer 5: Subprocess / External Call │
+│ - Apache RAT JAR execution │
+│ - CycloneDX CLI tools │
+│ - SVN/Git commands │
+│ - Network I/O (OSV API, SMTP, GitHub API) │
+│ - Individual process/socket timeouts │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Failure modes
+
+- **Worker timeout exceeded**: Manager terminates worker process, marks
task as failed
+- **Subprocess hang**: Worker timeout will eventually trigger, killing the
parent worker
+- **Network timeout**: Individual HTTP/SMTP clients have their own timeouts
before the worker timeout fires
+- **Upload timeout**: Hypercorn closes the connection after
`UPLOAD_BODY_TIMEOUT`
+- **Memory exhaustion**: Workers exit after a fixed number of tasks to
prevent memory leaks; manager spawns replacements
+
+## Per-user and per-application limits
+
+### Upload limits
+
+Configured in [`config.AppConfig`](/ref/atr/config.py:AppConfig):
+
+| Setting | Default | Purpose |
+|---------|---------|--------|
+| `MAX_CONTENT_LENGTH` | 512 MB | Maximum HTTP request body size |
+| `UPLOAD_BODY_TIMEOUT` | 3600 seconds | Maximum time to receive a request
body |
+| `MAX_EXTRACT_SIZE` | 2 GB | Maximum total extracted size from an archive |
+| `EXTRACT_CHUNK_SIZE` | 4 MB | Chunk size during extraction (controls
memory use) |
+
+### Worker pool resources
+
+The [`WorkerManager`](/ref/atr/manager.py:WorkerManager) maintains a
fixed-size pool of worker processes. Configuration includes:
+
+- Number of concurrent workers (pool size)
+- Maximum tasks per worker before voluntary exit
+- Task timeout before forced termination
+- Health check interval
+
+### Session limits
+
+| Setting | Default | Purpose |
+|---------|---------|--------|
+| `MAX_SESSION_AGE` | 72 hours | Maximum browser session lifetime |
+| `ACCOUNT_CHECK_INTERVAL` | 300 seconds | LDAP re-validation interval |
+
+### Database access
+
+- SQLite WAL mode enables concurrent reads during writes
+- `BEGIN IMMEDIATE` transactions prevent write starvation
+- Query builders in [`db`](/ref/atr/db/__init__.py) provide structured,
bounded queries
+
+## Monitoring and alerting
+
+### Log files
+
+ATR writes structured logs to several locations configured in `AppConfig`:
+
+| Log | Path | Contents |
+|-----|------|----------|
+| Storage audit | `state/audit/storage-audit.log` | All storage write
operations with timestamps |
+| Auth audit | `state/audit/auth-audit.log` | Authentication events |
+| Route performance | `state/logs/route-performance.log` | Request timing
data |
+| Request log | `state/logs/requests.log` | HTTP request records |
+
+### Key metrics to monitor
+
+- **Task queue depth**: Number of tasks in `QUEUED` status (query `Task`
table)
+- **Task failure rate**: Tasks marked as failed due to timeouts or errors
+- **Worker restarts**: Frequency of worker process replacement by the
manager
+- **Request latency**: Route performance log entries exceeding thresholds
+- **Disk usage**: `state/` directory growth, especially archives and
downloads
+- **Database size**: SQLite file size and WAL growth
+
+### Alerting recommendations
+
+- Alert on task queue depth exceeding 2× the worker pool size for more than
5 minutes
+- Alert on consecutive task failures for the same task type
+- Alert on disk usage exceeding 80% of available space
+- Alert on route performance exceeding 30 seconds for non-upload endpoints
+
+## Capacity planning
+
+### Scaling considerations
+
+ATR is designed for the Apache Software Foundation's release workflow. Key
scaling factors:
+
+1. **Concurrent releases**: Each release candidate generates multiple check
tasks (hashing, signature, license, RAT, archive integrity). Plan for peaks
during release season.
+2. **Archive sizes**: Large source archives (hundreds of MB) increase
extraction time and temporary disk usage. `MAX_EXTRACT_SIZE` bounds the worst
case.
+3. **SBOM complexity**: Projects with thousands of dependencies produce
large SBOMs that take longer to scan and score.
+4. **Network dependencies**: SVN imports, OSV scans, and GitHub API calls
depend on external service availability.
+
+### Resource sizing guidance
+
+| Resource | Minimum | Recommended | Notes |
+|----------|---------|-------------|-------|
+| Worker pool | 2 | 4-8 | Scale with expected concurrent releases |
+| Disk (state dir) | 10 GB | 50+ GB | Depends on release artifact sizes |
+| RAM per worker | 512 MB | 1-2 GB | RAT and SBOM tools may use significant
memory |
+| SQLite | — | SSD storage | WAL mode benefits from fast I/O |
+
+### Preventing availability loss
+
+- **Worker pool isolation**: Resource-intensive tasks cannot starve HTTP
request handling because they run in separate processes
+- **Voluntary worker exit**: Prevents memory leaks from accumulating across
many tasks
+- **Chunked extraction**: `EXTRACT_CHUNK_SIZE` prevents loading entire
archives into memory
+- **Configurable limits**: All resource limits are
environment-configurable, allowing tuning without code changes
+- **Graceful degradation**: Failed tasks can be retried; the system
continues serving other requests
````
#### `atr/docs/developer-guide.md`
Add the new resource-management page to the developer guide index.
````diff
--- a/atr/docs/developer-guide.md
+++ b/atr/docs/developer-guide.md
@@ -16,6 +16,7 @@
* `3.14.` [Input validation](input-validation)
* `3.15.` [TLS security configuration](tls-security-configuration)
* `3.16.` [ASFQuart usage](asfquart-usage)
+* `3.17.` [Resource management](resource-management)
**Sections**:
````
### Open questions
- What are the exact worker pool size and task timeout values configured in
WorkerManager? The source for atr/manager.py was not provided, so specific
numbers are estimated.
- Are there additional rate-limiting mechanisms (e.g., per-IP or per-user
request throttling) beyond the upload/size limits visible in config.py?
- What are the specific subprocess timeout values for Apache RAT, CycloneDX
CLI, and SVN/Git operations?
- Should the SSH session timeout for rsync/trusted publishing uploads be
documented here, and what is its configured value?
### Files examined
- `atr/docs/tasks.md`
- `atr/docs/overview-of-the-code.md`
- `atr/docs/developer-guide.md`
- `atr/models/args.py`
- `atr/models/results.py`
- `atr/config.py`
- `atr/db/__init__.py`
- `atr/storage/__init__.py`
### Related issues
This issue appears related to: #1049.
_Both address missing documentation of authorization rules and
resource-intensive operations_
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]