andrewmusselman opened a new issue, #26:
URL: https://github.com/apache/tooling-agents/issues/26
## Summary
The audit phase's analysis cache key in `asvs_audit.py` is `f"batch-{i}"`
where `i` is the batch index within a section. The cache namespace incorporates
ASVS section ID and file namespaces (so different sections / file scopes don't
collide), but within that scope the key is invariant under prompt changes. This
means edits to the Opus system prompt — including severity rubric,
instructions, examples — do not invalidate the cache. Cached Opus output
written under the old prompt is replayed on subsequent runs with
`clearCache=false`.
## Symptom
After editing the audit system prompt (e.g., to fix a calibration issue),
re-running the orchestrator with `clearCache=false` returns identical findings
to the previous run because every section hits cache. The fix appears to have
done nothing.
The current workaround is `clearCache=true`, but this is a sledgehammer — it
also wipes the source-code download from `code_namespace`, forcing a fresh
tarball pull from GitHub for every iteration. Slow and wasteful when all you
want is to re-audit with the new prompt.
## Proposed Fix
Include a hash of the analysis system prompt in the cache key:
```python
import hashlib
# In asvs_audit.py, near the existing cache key construction (~line 959):
prompt_fingerprint =
hashlib.sha256(analysis_system_prompt.encode()).hexdigest()[:8]
cache_key = f"batch-{i}-{prompt_fingerprint}"
```
When the prompt changes, all cache keys change, all sections re-audit.
Source download stays cached. `clearCache` reverts to its narrower meaning of
"I want a fresh tarball."
## Decision: hash scope
Two reasonable choices:
1. **Full prompt hash** (simpler) — any wording change anywhere invalidates
everything. Slight over-eager but eliminates ambiguity.
2. **Rubric-only hash** (more targeted) — extract the static rubric portion
and hash just that. Section-specific context and dynamic content don't trigger
invalidation.
Recommendation: start with full prompt hash. If over-invalidation gets
annoying in practice, refactor to rubric-only later. The cost of an unnecessary
cache miss is one extra Opus call per affected section — slow but bounded.
## Side Effect: Orphan Cache Keys
Old cache keys with stale prompt fingerprints become orphans in the
`audit-cache:analysis:*` namespace. Cheap in CouchDB but accumulates over time.
Two options:
1. Ignore — storage cost is negligible.
2. Add a periodic sweep that deletes keys with old prompt fingerprints.
Could piggyback on the existing `cleanStaleReports` flow.
Defer the sweep until orphans actually cause problems.
## Validation
1. Pick any single-line wording change to the audit system prompt.
2. Re-run the orchestrator on a previously-audited repo with
`clearCache=false`.
3. Verify in logs that Opus calls happen for every section (cache misses
everywhere).
4. Verify the source download was reused (no re-download).
5. Inspect CouchDB: `audit-cache:analysis:*` keys should have new
fingerprints; old ones still present as orphans.
## Related
- The severity calibration work that uncovered this — prompt edits silently
no-op'd until `clearCache=true` was used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]